-
Improved limit on neutrinoless double beta decay of \mohundred~from AMoRE-I
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate c…
▽ More
AMoRE searches for the signature of neutrinoless double beta decay of $^{100}$Mo with a 100 kg sample of enriched $^{100}$Mo. Scintillating molybdate crystals coupled with a metallic magnetic calorimeter operate at milli-Kelvin temperatures to measure the energy of electrons emitted in the decay. As a demonstration of the full-scale AMoRE, we conducted AMoRE-I, a pre-experiment with 18 molybdate crystals, at the Yangyang Underground Laboratory for over two years. The exposure was 8.02 kg$\cdot$year (or 3.89 kg$_{\mathrm{^{100}Mo}}\cdot$year) and the total background rate near the Q-value was 0.025 $\pm$ 0.002 counts/keV/kg/year. We observed no indication of $0νββ$ decay and report a new lower limit of the half-life of $^{100}$Mo $0νββ$ decay as $ T^{0ν}_{1/2}>3.0\times10^{24}~\mathrm{years}$ at 90\% confidence level. The effective Majorana mass limit range is $m_{ββ}<$(210--610) meV using nuclear matrix elements estimated in the framework of different models, including the recent shell model calculations.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization
Authors:
Bingdong Li,
Zixiang Di,
Yanting Yang,
Hong Qian,
Peng Yang,
Hao Hao,
Ke Tang,
Aimin Zhou
Abstract:
In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on huma…
▽ More
In this paper, we introduce a novel approach for large language model merging via black-box multi-objective optimization algorithms. The goal of model merging is to combine multiple models, each excelling in different tasks, into a single model that outperforms any of the individual source models. However, model merging faces two significant challenges: First, existing methods rely heavily on human intuition and customized strategies. Second, parameter conflicts often arise during merging, and while methods like DARE [1] can alleviate this issue, they tend to stochastically drop parameters, risking the loss of important delta parameters. To address these challenges, we propose the MM-MO method, which automates the search for optimal merging configurations using multi-objective optimization algorithms, eliminating the need for human intuition. During the configuration searching process, we use estimated performance across multiple diverse tasks as optimization objectives in order to alleviate the parameter conflicting between different source models without losing crucial delta parameters. We conducted comparative experiments with other mainstream model merging methods, demonstrating that our method consistently outperforms them. Moreover, our experiments reveal that even task types not explicitly targeted as optimization objectives show performance improvements, indicating that our method enhances the overall potential of the model rather than merely overfitting to specific task types. This approach provides a significant advancement in model merging techniques, offering a robust and plug-and-play solution for integrating diverse models into a unified, high-performing model.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Large Language Models as Surrogate Models in Evolutionary Algorithms: A Preliminary Study
Authors:
Hao Hao,
Xiaoqun Zhang,
Aimin Zhou
Abstract:
Large Language Models (LLMs) have achieved significant progress across various fields and have exhibited strong potential in evolutionary computation, such as generating new solutions and automating algorithm design. Surrogate-assisted selection is a core step in evolutionary algorithms to solve expensive optimization problems by reducing the number of real evaluations. Traditionally, this has rel…
▽ More
Large Language Models (LLMs) have achieved significant progress across various fields and have exhibited strong potential in evolutionary computation, such as generating new solutions and automating algorithm design. Surrogate-assisted selection is a core step in evolutionary algorithms to solve expensive optimization problems by reducing the number of real evaluations. Traditionally, this has relied on conventional machine learning methods, leveraging historical evaluated evaluations to predict the performance of new solutions. In this work, we propose a novel surrogate model based purely on LLM inference capabilities, eliminating the need for training. Specifically, we formulate model-assisted selection as a classification and regression problem, utilizing LLMs to directly evaluate the quality of new solutions based on historical data. This involves predicting whether a solution is good or bad, or approximating its value. This approach is then integrated into evolutionary algorithms, termed LLM-assisted EA (LAEA). Detailed experiments compared the visualization results of 2D data from 9 mainstream LLMs, as well as their performance on optimization problems. The experimental results demonstrate that LLMs have significant potential as surrogate models in evolutionary computation, achieving performance comparable to traditional surrogate models only using inference. This work offers new insights into the application of LLMs in evolutionary computation. Code is available at: https://github.com/hhyqhh/LAEA.git
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Projected background and sensitivity of AMoRE-II
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Y. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (81 additional authors not shown)
Abstract:
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located ap…
▽ More
AMoRE-II aims to search for neutrinoless double beta decay with an array of 423 Li$_2$$^{100}$MoO$_4$ crystals operating in the cryogenic system as the main phase of the Advanced Molybdenum-based Rare process Experiment (AMoRE). AMoRE has been planned to operate in three phases: AMoRE-pilot, AMoRE-I, and AMoRE-II. AMoRE-II is currently being installed at the Yemi Underground Laboratory, located approximately 1000 meters deep in Jeongseon, Korea. The goal of AMoRE-II is to reach up to $T^{0νββ}_{1/2}$ $\sim$ 6 $\times$ 10$^{26}$ years, corresponding to an effective Majorana mass of 15 - 29 meV, covering all the inverted mass hierarchy regions. To achieve this, the background level of the experimental configurations and possible background sources of gamma and beta events should be well understood. We have intensively performed Monte Carlo simulations using the GEANT4 toolkit in all the experimental configurations with potential sources. We report the estimated background level that meets the 10$^{-4}$counts/(keV$\cdot$kg$\cdot$yr) requirement for AMoRE-II in the region of interest (ROI) and show the projected half-life sensitivity based on the simulation study.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Exclusion of the Cosmological Triangle in Reactor-Based Search for Axion-Like Particles
Authors:
Byung Ju Park,
Jae ** Choi,
Eunju Jeon,
**yu Kim,
Kyungwon Kim,
Sung Hyun Kim,
Sun Kee Kim,
Yeongduk Kim,
Young Ju Ko,
Byoung-Cheol Koh,
Chang Hyon Ha,
Seo Hyun Lee,
In Soo Lee,
Hyunseok Lee,
Hyun Su Lee,
Jaison Lee,
Yoomin Oh,
Doo** Kim
Abstract:
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the…
▽ More
We report new constraints on axion-like particle (ALP) using data corresponding to a sodium iodine target exposure of 3063 kg$\cdot$days from the neutrino elastic scattering observation with NaI (NEON) experiment. A 16.7 kg of thallium-doped sodium iodide target was located 23.7 meters from a 2.8 GW thermal power nuclear reactor. We searched for ALPs produced by high-flux photons by comparing the energy spectra of data collected during reactor-on (1596 kg$\cdot$days exposure) and reactor-off (1467 kg$\cdot$days exposure) periods. No signal consistent with ALP interaction was identified, allowing us to set exclusion limits at the 95% confidence level. Our limits cover previously unexplored regions for both photon couplings (${g_{aγ}}$) and electron couplings (${g_{ae}}$) for axion masses around 1 MeV/c$^2$. Notably, the NEON data excludes the unconstrained region identified by laboratory-based searches for photon couplings within the "cosmological triangle" for the first time. The observed 95\% confidence level limits reach as low as ${g_{aγ}}$ of 4.33$\times$ 10$^{-8}$ GeV$^{-1}$ and ${g_{ae}}$ of 1.10$\times$ 10$^{-9}$ for axion masses of 1.7 MeV/c$^2$ and 1.0 MeV/c$^2$, respectively.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Tunable huge asymmetric transmission in mid-infrared range through a monolayer graphene chiral metasurface
Authors:
Qi Wang,
Haotian Hao,
Wei Xin
Abstract:
Metasurface has witnessed an urgent need for reconfigurable chiral manipulation recently, while the graphene-based devices attracted significant attention from researchers due to their inherent tunable properties. Here, an L-shaped metasurface composed of three hollow monolayer graphene resonant rings is proposed. A huge dual-frequency asymmetric transmission (AT) in the mid-infrared band (7800 nm…
▽ More
Metasurface has witnessed an urgent need for reconfigurable chiral manipulation recently, while the graphene-based devices attracted significant attention from researchers due to their inherent tunable properties. Here, an L-shaped metasurface composed of three hollow monolayer graphene resonant rings is proposed. A huge dual-frequency asymmetric transmission (AT) in the mid-infrared band (7800 nm-9000 nm) with transmission efficiency peaks of 0.14 and 0.26 is revealed, which is optimal of its similar monolayer counterparts in the current band. By changing the Fermi level of graphene and the relaxation time of its carriers, the AT spectrum including the positions and widths of the peaks can be effectively modulated. More importantly, the structure designed here gives priority to the machinability of the metasurface configuration, ensuring its practicality and promising applications in chiral applications in the future.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Non-destructive Degradation Pattern Decoupling for Ultra-early Battery Prototype Verification Using Physics-informed Machine Learning
Authors:
Shengyu Tao,
Mengtian Zhang,
Zixi Zhao,
Haoyang Li,
Ruifei Ma,
Yunhong Che,
Xin Sun,
Lin Su,
Xiangyu Chen,
Zihao Zhou,
Heng Chang,
Tingwei Cao,
Xiao Xiao,
Yaojun Liu,
Wenjun Yu,
Zhongling Xu,
Yang Li,
Han Hao,
Xuan Zhang,
Xiaosong Hu,
Guangmin ZHou
Abstract:
Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed mac…
▽ More
Manufacturing complexities and uncertainties have impeded the transition from material prototypes to commercial batteries, making prototype verification critical to quality assessment. A fundamental challenge involves deciphering intertwined chemical processes to characterize degradation patterns and their quantitative relationship with battery performance. Here we show that a physics-informed machine learning approach can quantify and visualize temporally resolved losses concerning thermodynamics and kinetics only using electric signals. Our method enables non-destructive degradation pattern characterization, expediting temperature-adaptable predictions of entire lifetime trajectories, rather than end-of-life points. The verification speed is 25 times faster yet maintaining 95.1% accuracy across temperatures. Such advances facilitate more sustainable management of defective prototypes before massive production, establishing a 19.76 billion USD scrap material recycling market by 2060 in China. By incorporating stepwise charge acceptance as a measure of the initial manufacturing variability of normally identical batteries, we can immediately identify long-term degradation variations. We attribute the predictive power to interpreting machine learning insights using material-agnostic featurization taxonomy for degradation pattern decoupling. Our findings offer new possibilities for dynamic system analysis, such as battery prototype degradation, demonstrating that complex pattern evolutions can be accurately predicted in a non-destructive and data-driven fashion by integrating physics-informed machine learning.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
A First Look at Kolmogorov-Arnold Networks in Surrogate-assisted Evolutionary Algorithms
Authors:
Hao Hao,
Xiaoqun Zhang,
Bingdong Li,
Aimin Zhou
Abstract:
Surrogate-assisted Evolutionary Algorithm (SAEA) is an essential method for solving expensive expensive problems. Utilizing surrogate models to substitute the optimization function can significantly reduce reliance on the function evaluations during the search process, thereby lowering the optimization costs. The construction of surrogate models is a critical component in SAEAs, with numerous mach…
▽ More
Surrogate-assisted Evolutionary Algorithm (SAEA) is an essential method for solving expensive expensive problems. Utilizing surrogate models to substitute the optimization function can significantly reduce reliance on the function evaluations during the search process, thereby lowering the optimization costs. The construction of surrogate models is a critical component in SAEAs, with numerous machine learning algorithms playing a pivotal role in the model-building phase. This paper introduces Kolmogorov-Arnold Networks (KANs) as surrogate models within SAEAs, examining their application and effectiveness. We employ KANs for regression and classification tasks, focusing on the selection of promising solutions during the search process, which consequently reduces the number of expensive function evaluations. Experimental results indicate that KANs demonstrate commendable performance within SAEAs, effectively decreasing the number of function calls and enhancing the optimization efficiency. The relevant code is publicly accessible and can be found in the GitHub repository.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Multiple-Choice Questions are Efficient and Robust LLM Evaluators
Authors:
Ziyin Zhang,
Zhaokun Jiang,
Lizhen Xu,
Hongkun Hao,
Rui Wang
Abstract:
We present GSM-MC, a multiple-choice (MC) dataset constructed by collecting answers and incorrect predictions on GSM8K from 60 open-source models. Through extensive experiments, we show that LLMs' performance on the MC version of this popular benchmark is strongly correlated with their performance on the original version and is quite robust to distractor choices and option orders, while the evalua…
▽ More
We present GSM-MC, a multiple-choice (MC) dataset constructed by collecting answers and incorrect predictions on GSM8K from 60 open-source models. Through extensive experiments, we show that LLMs' performance on the MC version of this popular benchmark is strongly correlated with their performance on the original version and is quite robust to distractor choices and option orders, while the evaluation time is reduced by a factor of up to 30. Following similar procedures, we introduce MATH-MC, constructed from MATH, and PythonIO, a new program reasoning MC dataset constructed from HumanEval and MBPP. Experimental results indicate that LLMs' performance on these MC benchmarks leaves much room for improvement. Our data and code are available at https://github.com/Geralt-Targaryen/MC-Evaluation.
△ Less
Submitted 26 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
On critical double phase problems in $\mathbb{R}^N$ involving variable exponents
Authors:
Hoang Hai Ha,
Ky Ho
Abstract:
We establish a Lions-type concentration-compactness principle and its variant at infinity for Musielak-Orlicz-Sobolev spaces associated with a double phase operator with variable exponents. Based on these principles, we demonstrate the existence and concentration of solutions for a class of critical double phase equations of Schrödinger type in $\mathbb{R}^N$ involving variable exponents with vari…
▽ More
We establish a Lions-type concentration-compactness principle and its variant at infinity for Musielak-Orlicz-Sobolev spaces associated with a double phase operator with variable exponents. Based on these principles, we demonstrate the existence and concentration of solutions for a class of critical double phase equations of Schrödinger type in $\mathbb{R}^N$ involving variable exponents with various types of potentials. Our growth condition is more appropriately suited compared to the existing works.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection
Authors:
Luan Pham,
Huong Ha,
Hongyu Zhang
Abstract:
Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimat…
▽ More
Detecting failures and identifying their root causes promptly and accurately is crucial for ensuring the availability of microservice systems. A typical failure troubleshooting pipeline for microservices consists of two phases: anomaly detection and root cause analysis. While various existing works on root cause analysis require accurate anomaly detection, there is no guarantee of accurate estimation with anomaly detection techniques. Inaccurate anomaly detection results can significantly affect the root cause localization results. To address this challenge, we propose BARO, an end-to-end approach that integrates anomaly detection and root cause analysis for effectively troubleshooting failures in microservice systems. BARO leverages the Multivariate Bayesian Online Change Point Detection technique to model the dependency within multivariate time-series metrics data, enabling it to detect anomalies more accurately. BARO also incorporates a novel nonparametric statistical hypothesis testing technique for robustly identifying root causes, which is less sensitive to the accuracy of anomaly detection compared to existing works. Our comprehensive experiments conducted on three popular benchmark microservice systems demonstrate that BARO consistently outperforms state-of-the-art approaches in both anomaly detection and root cause analysis.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Measurement-induced phase transitions in systems with diffusive dynamics
Authors:
Hyunsoo Ha,
Akshat Pandey,
Sarang Gopalakrishnan,
David A. Huse
Abstract:
The competition between scrambling and projective measurements can lead to measurement-induced entanglement phase transitions (MIPT). In this work, we show that the universality class of the MIPT is drastically altered when the system is coupled to a diffusing conserved density. Specifically, we consider a 1+1d random Clifford circuit locally monitored by classically diffusing particles (``measure…
▽ More
The competition between scrambling and projective measurements can lead to measurement-induced entanglement phase transitions (MIPT). In this work, we show that the universality class of the MIPT is drastically altered when the system is coupled to a diffusing conserved density. Specifically, we consider a 1+1d random Clifford circuit locally monitored by classically diffusing particles (``measurers''). The resulting diffusive correlations in the measurement density are a relevant perturbation to the usual space-time random MIPT critical point, producing a new universality class for this phase transition. We find ``Griffiths-like'' effects due to rare space-time regions where, e.g., the diffusive measurers have a low or high density, but these are considerably weaker than the Griffiths effects that occur with quenched randomness that produce rare spatial regions with infinite lifetime.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Diameter of Commuting Graphs of Lie Algebras
Authors:
Hieu V. Ha,
Hoa D. Quang,
Vu A. Le,
Tuyen T. M Nguyen
Abstract:
In this paper, we study the connectedness of the commuting graph of a general Lie algebra and provide a process to determine whether the commuting graph is connected or not, as well as to compute an upper bound for its diameter. In addition, we will examine the connectedness and diameter of the commuting graphs of some remarkable classes of Lie algebras, including: (1) a class of Lie algebras with…
▽ More
In this paper, we study the connectedness of the commuting graph of a general Lie algebra and provide a process to determine whether the commuting graph is connected or not, as well as to compute an upper bound for its diameter. In addition, we will examine the connectedness and diameter of the commuting graphs of some remarkable classes of Lie algebras, including: (1) a class of Lie algebras with one- or two-dimensional derived algebras; and (2) a class of solvable Lie algebras over the real field of dimension up to $4$.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
Authors:
JoonHo Lee,
Jae Oh Woo,
Juree Seok,
Parisa Hassanzadeh,
Wooseok Jang,
JuYoun Son,
Sima Didari,
Baruch Gutow,
Heng Hao,
Hankyu Moon,
Wenjun Hu,
Yeong-Dae Kwon,
Taehee Lee,
Seungjai Min
Abstract:
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for t…
▽ More
Assessing response quality to instructions in language models is vital but challenging due to the complexity of human language across different contexts. This complexity often results in ambiguous or inconsistent interpretations, making accurate assessment difficult. To address this issue, we propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses based on Bayesian approximation. Trained with preference datasets, our uncertainty-enabled proxy not only scores rewards for responses but also evaluates their inherent uncertainty. Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training. Our method boosts the instruction following capability of language models by refining data curation for training and improving policy optimization objectives, thereby surpassing existing methods by a large margin on benchmarks such as Vicuna and MT-bench. These findings highlight that our proposed approach substantially advances language model training and paves a new way of harnessing uncertainty within language models.
△ Less
Submitted 19 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Investigating Interaction Modes and User Agency in Human-LLM Collaboration for Domain-Specific Data Analysis
Authors:
Jia**g Guo,
Vikram Mohanty,
Jorge Piazentin Ono,
Hongtao Hao,
Liang Gou,
Liu Ren
Abstract:
Despite demonstrating robust capabilities in performing tasks related to general-domain data-operation tasks, Large Language Models (LLMs) may exhibit shortcomings when applied to domain-specific tasks. We consider the design of domain-specific AI-powered data analysis tools from two dimensions: interaction and user agency. We implemented two design probes that fall on the two ends of the two dime…
▽ More
Despite demonstrating robust capabilities in performing tasks related to general-domain data-operation tasks, Large Language Models (LLMs) may exhibit shortcomings when applied to domain-specific tasks. We consider the design of domain-specific AI-powered data analysis tools from two dimensions: interaction and user agency. We implemented two design probes that fall on the two ends of the two dimensions: an open-ended high agency (OHA) prototype and a structured low agency (SLA) prototype. We conducted an interview study with nine data scientists to investigate (1) how users perceived the LLM outputs for data analysis assistance, and (2) how the two test design probes, OHA and SLA, affected user behavior, performance, and perceptions. Our study revealed insights regarding participants' interactions with LLMs, how they perceived the results, and their desire for explainability concerning LLM outputs, along with a noted need for collaboration with other users, and how they envisioned the utility of LLMs in their workflow.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures
Authors:
Han Yang,
Chenxi Hu,
Yichi Zhou,
Xixian Liu,
Yu Shi,
Jielan Li,
Guanzhi Li,
Zekun Chen,
Shuizhou Chen,
Claudio Zeni,
Matthew Horton,
Robert Pinsler,
Andrew Fowler,
Daniel Zügner,
Tian Xie,
Jake Smith,
Lixin Sun,
Qian Wang,
Lingyu Kong,
Chang Liu,
Hongxia Hao,
Ziheng Lu
Abstract:
Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computatio…
▽ More
Accurate and fast prediction of materials properties is central to the digital transformation of materials design. However, the vast design space and diverse operating conditions pose significant challenges for accurately modeling arbitrary material candidates and forecasting their properties. We present MatterSim, a deep learning model actively learned from large-scale first-principles computations, for efficient atomistic simulations at first-principles level and accurate prediction of broad material properties across the periodic table, spanning temperatures from 0 to 5000 K and pressures up to 1000 GPa. Out-of-the-box, the model serves as a machine learning force field, and shows remarkable capabilities not only in predicting ground-state material structures and energetics, but also in simulating their behavior under realistic temperatures and pressures, signifying an up to ten-fold enhancement in precision compared to the prior best-in-class. This enables MatterSim to compute materials' lattice dynamics, mechanical and thermodynamic properties, and beyond, to an accuracy comparable with first-principles methods. Specifically, MatterSim predicts Gibbs free energies for a wide range of inorganic solids with near-first-principles accuracy and achieves a 15 meV/atom resolution for temperatures up to 1000K compared with experiments. This opens an opportunity to predict experimental phase diagrams of materials at minimal computational cost. Moreover, MatterSim also serves as a platform for continuous learning and customization by integrating domain-specific data. The model can be fine-tuned for atomistic simulations at a desired level of theory or for direct structure-to-property predictions, achieving high data efficiency with a reduction in data requirements by up to 97%.
△ Less
Submitted 10 May, 2024; v1 submitted 8 May, 2024;
originally announced May 2024.
-
Hierarchical Space-Time Attention for Micro-Expression Recognition
Authors:
Haihong Hao,
Shuo Wang,
Huixia Ben,
Yanbin Hao,
Yansong Wang,
Weiwei Wang
Abstract:
Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the…
▽ More
Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within these relationships. To solve this issue, we propose the Hierarchical Space-Time Attention (HSTA). Specifically, we first process ME video frames and special frames or data parallelly by our cascaded Unimodal Space-Time Attention (USTA) to establish connections between subtle facial movements and specific facial areas. Then, we design Crossmodal Space-Time Attention (CSTA) to achieve a higher-quality fusion for crossmodal data. Finally, we hierarchically integrate USTA and CSTA to grasp the deeper facial cues. Our model emphasizes temporal modeling without neglecting the processing of special data, and it fuses the contents in different modalities while maintaining their respective uniqueness. Extensive experiments on the four benchmarks show the effectiveness of our proposed HSTA. Specifically, compared with the latest method on the CASME3 dataset, it achieves about 3% score improvement in seven-category classification.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
G-Refine: A General Quality Refiner for Text-to-Image Generation
Authors:
Chunyi Li,
Haoning Wu,
Hongkun Hao,
Zicheng Zhang,
Tengchaun Kou,
Chaofeng Chen,
Lei Bai,
Xiaohong Liu,
Weisi Lin,
Guangtao Zhai
Abstract:
With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro…
▽ More
With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compromising the integrity of high-quality ones. The model is composed of three interconnected modules: a perception quality indicator, an alignment quality indicator, and a general quality enhancement module. Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly. Extensive experimentation reveals that when compared to alternative optimization methods, AIGIs after G-Refine outperform in 10+ quality metrics across 4 databases. This improvement significantly contributes to the practical application of contemporary T2I models, paving the way for their broader adoption. The code will be released on https://github.com/Q-Future/Q-Refine.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
A Conditional Independence Test in the Presence of Discretization
Authors:
Boyang Sun,
Yu Yao,
Huangyuan Hao,
Yumou Qiu,
Kun Zhang
Abstract:
Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider $X_1$, $\tilde{X}_2$ and $X_3$ are observed variables, where $\tilde{X}_2$ is a discretization of latent variables…
▽ More
Testing conditional independence has many applications, such as in Bayesian network learning and causal discovery. Different test methods have been proposed. However, existing methods generally can not work when only discretized observations are available. Specifically, consider $X_1$, $\tilde{X}_2$ and $X_3$ are observed variables, where $\tilde{X}_2$ is a discretization of latent variables $X_2$. Applying existing test methods to the observations of $X_1$, $\tilde{X}_2$ and $X_3$ can lead to a false conclusion about the underlying conditional independence of variables $X_1$, $X_2$ and $X_3$. Motivated by this, we propose a conditional independence test specifically designed to accommodate the presence of such discretization. To achieve this, we design the bridge equations to recover the parameter reflecting the statistical information of the underlying latent continuous variables. An appropriate test statistic and its asymptotic distribution under the null hypothesis of conditional independence have also been derived. Both theoretical results and empirical validation have been provided, demonstrating the effectiveness of our test methods.
△ Less
Submitted 3 May, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study
Authors:
Zooey Nguyen,
Anthony Annunziata,
Vinh Luong,
Sang Dinh,
Quynh Le,
Anh Hai Ha,
Chanh Le,
Hong An Phan,
Shruti Raghavan,
Christopher Nguyen
Abstract:
This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accura…
▽ More
This paper investigates the impact of domain-specific model fine-tuning and of reasoning mechanisms on the performance of question-answering (Q&A) systems powered by large language models (LLMs) and Retrieval-Augmented Generation (RAG). Using the FinanceBench SEC financial filings dataset, we observe that, for RAG, combining a fine-tuned embedding model with a fine-tuned LLM achieves better accuracy than generic models, with relatively greater gains attributable to fine-tuned embedding models. Additionally, employing reasoning iterations on top of RAG delivers an even bigger jump in performance, enabling the Q&A systems to get closer to human-expert quality. We discuss the implications of such findings, propose a structured technical design space capturing major technical components of Q&A AI, and provide recommendations for making high-impact technical choices for such components. We plan to follow up on this work with actionable guides for AI teams and further investigations into the impact of domain-specific augmentation in RAG and into agentic AI capabilities such as advanced planning and reasoning.
△ Less
Submitted 19 April, 2024; v1 submitted 17 April, 2024;
originally announced April 2024.
-
A special class of pure $O$-sequences
Authors:
Tài Huy Hà,
Takayuki Hibi,
Fabrizio Zanello
Abstract:
The pure $O$-sequences of the form $(1,a,a,\ldots)$ are classified.
The pure $O$-sequences of the form $(1,a,a,\ldots)$ are classified.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Towards Accurate Binarization of Diffusion Model
Authors:
Xingyu Zheng,
Haotong Qin,
Xudong Ma,
Mingyuan Zhang,
Haojie Hao,
Jiakai Wang,
Zixiang Zhao,
**yang Guo,
Xianglong Liu
Abstract:
With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware tr…
▽ More
With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel quantization-aware training approach for DMs, namely BinaryDM. The proposed method pushes DMs' weights toward accurate and efficient binarization, considering the representation and computation properties. From the representation perspective, we present a Learnable Multi-basis Binarizer (LMB) to recover the representations generated by the binarized DM. The LMB enhances detailed information through the flexible combination of dual binary bases while applying to parameter-sparse locations of DM architectures to achieve minor burdens. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Moreover, a quick progressive warm-up is applied to BinaryDM, avoiding convergence difficulties by layerwisely progressive quantization at the beginning of training. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1.1-bit weight and 4-bit activation (W1.1A4), BinaryDM achieves as low as 7.11 FID and saves the performance from collapse (baseline FID 39.69). As the first binarization method for diffusion models, W1.1A4 BinaryDM achieves impressive 9.3 times OPs and 24.8 times model size savings, showcasing its substantial potential for edge deployment.
△ Less
Submitted 28 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Powers of Edge Ideals of Graphs with Minimal Cellular Resolutions
Authors:
Trung Chau,
Tài Huy Hà,
Aryaman Maithani
Abstract:
Let $G$ be a connected graph and let $I(G)$ denote its edge ideal. We classify when $I(G)^n$, for $n \ge 1$, admits a minimal Lyubeznik resolution. We also give a characterization for when $I(G)^n$ is bridge-friendly, which, in turn, implies that $I(G)^n$ has a minimal Barile-Macchia cellular resolution.
Let $G$ be a connected graph and let $I(G)$ denote its edge ideal. We classify when $I(G)^n$, for $n \ge 1$, admits a minimal Lyubeznik resolution. We also give a characterization for when $I(G)^n$ is bridge-friendly, which, in turn, implies that $I(G)^n$ has a minimal Barile-Macchia cellular resolution.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Resolution and Betti numbers of Vertex cover ideals
Authors:
Tài Huy Hà,
Takayuki Hibi
Abstract:
The vertex cover ideal $J(G)$ of a finite graph $G$ is studied. We characterize when a Cohen--Macaulay vertex cover ideal $J(G)$ has a Scarf minimal free resolution. Furthermore, by using both combinatorial and topological techniques, the graded Betti number $β_{i,i+j}(J(G))$, where $i$ and $j$ are the projective dimension and the regularity of $J(G)$, is computed, when $G$ is either a path or a c…
▽ More
The vertex cover ideal $J(G)$ of a finite graph $G$ is studied. We characterize when a Cohen--Macaulay vertex cover ideal $J(G)$ has a Scarf minimal free resolution. Furthermore, by using both combinatorial and topological techniques, the graded Betti number $β_{i,i+j}(J(G))$, where $i$ and $j$ are the projective dimension and the regularity of $J(G)$, is computed, when $G$ is either a path or a cycle.
△ Less
Submitted 3 April, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Authors:
Shujie Hu,
Long Zhou,
Shujie Liu,
Sanyuan Chen,
Hongkun Hao,
**g Pan,
Xunying Liu,
**yu Li,
Sunit Sivasankaran,
Linquan Liu,
Furu Wei
Abstract:
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In th…
▽ More
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilities into LLMs poses significant challenges, particularly with respect to generalizing across varied contexts and executing complex auditory tasks. In this work, we introduce WavLLM, a robust and adaptive speech large language model with dual encoders, and a prompt-aware LoRA weight adapter, optimized by a two-stage curriculum learning approach. Leveraging dual encoders, we decouple different types of speech information, utilizing a Whisper encoder to process the semantic content of speech, and a WavLM encoder to capture the unique characteristics of the speaker's identity. Within the curriculum learning framework, WavLLM first builds its foundational capabilities by optimizing on mixed elementary single tasks, followed by advanced multi-task training on more complex tasks such as combinations of the elementary tasks. To enhance the flexibility and adherence to different tasks and instructions, a prompt-aware LoRA weight adapter is introduced in the second advanced multi-task training stage. We validate the proposed model on universal speech benchmarks including tasks such as ASR, ST, SV, ER, and also apply it to specialized datasets like Gaokao English listening comprehension set for SQA, and speech Chain-of-Thought (CoT) evaluation set. Experiments demonstrate that the proposed model achieves state-of-the-art performance across a range of speech tasks on the same model size, exhibiting robust generalization capabilities in executing complex tasks using CoT approach. Furthermore, our model successfully completes Gaokao tasks without specialized training. The codes, models, audio, and Gaokao evaluation set can be accessed at \url{aka.ms/wavllm}.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrain
Authors:
Yang Bai,
Phuoc Thanh Tran Ngoc,
Huu Duoc Nguyen,
Duc Long Le,
Quang Huy Ha,
Kazuki Kai,
Yu Xiang See To,
Yaosheng Deng,
Jie Song,
Naoki Wakamiya,
Hirotaka Sato,
Masaki Ogura
Abstract:
Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable roboti…
▽ More
Navigating multi-robot systems in complex terrains has always been a challenging task. This is due to the inherent limitations of traditional robots in collision avoidance, adaptation to unknown environments, and sustained energy efficiency. In order to overcome these limitations, this research proposes a solution by integrating living insects with miniature electronic controllers to enable robotic-like programmable control, and proposing a novel control algorithm for swarming. Although these creatures, called cyborg insects, have the ability to instinctively avoid collisions with neighbors and obstacles while adapting to complex terrains, there is a lack of literature on the control of multi-cyborg systems. This research gap is due to the difficulty in coordinating the movements of a cyborg system under the presence of insects' inherent individual variability in their reactions to control input. In response to this issue, we propose a novel swarm navigation algorithm addressing these challenges. The effectiveness of the algorithm is demonstrated through an experimental validation in which a cyborg swarm was successfully navigated through an unknown sandy field with obstacles and hills. This research contributes to the domain of swarm robotics and showcases the potential of integrating biological organisms with robotics and control theory to create more intelligent autonomous systems with real-world applications.
△ Less
Submitted 27 March, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Authors:
Yihua Cheng,
Yaning Zhu,
Zongji Wang,
Hongquan Hao,
Yongwei Liu,
Shiqing Cheng,
Xi Wang,
Hyung ** Chang
Abstract:
Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering…
▽ More
Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, collected from 125 subjects and covering a large range of gaze and head poses within vehicles. Conventional gaze collection systems are inadequate for in-vehicle use. In this dataset, we propose a new vision-based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annotation challenges. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. In-vehicle face images often suffer from low resolution, prompting our introduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expanding upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation. GazeDPTR shows state-of-the-art performance on the IVGaze dataset. Thirdly, we explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leveraging both positional features from the projection points and visual attributes from images, we achieve superior performance compared to relying solely on visual features, substantiating the advantage of gaze estimation. Our project is available at https://yihua.zone/work/ivgaze.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Model Uncertainty in Evolutionary Optimization and Bayesian Optimization: A Comparative Analysis
Authors:
Hao Hao,
Xiaoqun Zhang,
Aimin Zhou
Abstract:
Black-box optimization problems, which are common in many real-world applications, require optimization through input-output interactions without access to internal workings. This often leads to significant computational resources being consumed for simulations. Bayesian Optimization (BO) and Surrogate-Assisted Evolutionary Algorithm (SAEA) are two widely used gradient-free optimization techniques…
▽ More
Black-box optimization problems, which are common in many real-world applications, require optimization through input-output interactions without access to internal workings. This often leads to significant computational resources being consumed for simulations. Bayesian Optimization (BO) and Surrogate-Assisted Evolutionary Algorithm (SAEA) are two widely used gradient-free optimization techniques employed to address such challenges. Both approaches follow a similar iterative procedure that relies on surrogate models to guide the search process. This paper aims to elucidate the similarities and differences in the utilization of model uncertainty between these two methods, as well as the impact of model inaccuracies on algorithmic performance. A novel model-assisted strategy is introduced, which utilizes unevaluated solutions to generate offspring, leveraging the population-based search capabilities of evolutionary algorithm to enhance the effectiveness of model-assisted optimization. Experimental results demonstrate that the proposed approach outperforms mainstream Bayesian optimization algorithms in terms of accuracy and efficiency.
△ Less
Submitted 22 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues
Authors:
Huizi Hao,
Kazi Amit Hasan,
Hong Qin,
Marcos Macedo,
Yuan Tian,
Steven H. H. Ding,
Ahmed E. Hassan
Abstract:
ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in a variety of tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in…
▽ More
ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in a variety of tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues. We manually examined the content of the conversations and characterized the dynamics of the sharing behavior, i.e., understanding the rationale behind the sharing, identifying the locations where the conversations were shared, and determining the roles of the developers who shared them. Our main observations are: (1) Developers seek ChatGPT assistance across 16 types of software engineering inquiries. In both conversations shared in PRs and issues, the most frequently encountered inquiry categories include code generation, conceptual questions, how-to guides, issue resolution, and code review. (2) Developers frequently engage with ChatGPT via multi-turn conversations where each prompt can fulfill various roles, such as unveiling initial or new tasks, iterative follow-up, and prompt refinement. Multi-turn conversations account for 33.2% of the conversations shared in PRs and 36.9% in issues. (3) In collaborative coding, developers leverage shared conversations with ChatGPT to facilitate their role-specific contributions, whether as authors of PRs or issues, code reviewers, or collaborators on issues. Our work serves as the first step towards understanding the dynamics between developers and ChatGPT in collaborative software development and opens up new directions for future research on the topic.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
PaperBot: Learning to Design Real-World Tools Using Paper
Authors:
Ruoshi Liu,
Junbang Liang,
Sruthi Sudhakar,
Huy Ha,
Cheng Chi,
Shuran Song,
Carl Vondrick
Abstract:
Paper is a cheap, recyclable, and clean material that is often used to make practical tools. Traditional tool design either relies on simulation or physical analysis, which is often inaccurate and time-consuming. In this paper, we propose PaperBot, an approach that directly learns to design and use a tool in the real world using paper without human intervention. We demonstrated the effectiveness a…
▽ More
Paper is a cheap, recyclable, and clean material that is often used to make practical tools. Traditional tool design either relies on simulation or physical analysis, which is often inaccurate and time-consuming. In this paper, we propose PaperBot, an approach that directly learns to design and use a tool in the real world using paper without human intervention. We demonstrated the effectiveness and efficiency of PaperBot on two tool design tasks: 1. learning to fold and throw paper airplanes for maximum travel distance 2. learning to cut paper into grippers that exert maximum grip** force. We present a self-supervised learning framework that learns to perform a sequence of folding, cutting, and dynamic manipulation actions in order to optimize the design and use of a tool. We deploy our system to a real-world two-arm robotic system to solve challenging design tasks that involve aerodynamics (paper airplane) and friction (paper gripper) that are impossible to simulate accurately.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Accurate Spatial Gene Expression Prediction by integrating Multi-resolution features
Authors:
Youngmin Chung,
Ji Hun Ha,
Kyeong Chan Im,
Joo Sang Lee
Abstract:
Recent advancements in Spatial Transcriptomics (ST) technology have facilitated detailed gene expression analysis within tissue contexts. However, the high costs and methodological limitations of ST necessitate a more robust predictive model. In response, this paper introduces TRIPLEX, a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs). TRIPL…
▽ More
Recent advancements in Spatial Transcriptomics (ST) technology have facilitated detailed gene expression analysis within tissue contexts. However, the high costs and methodological limitations of ST necessitate a more robust predictive model. In response, this paper introduces TRIPLEX, a novel deep learning framework designed to predict spatial gene expression from Whole Slide Images (WSIs). TRIPLEX uniquely harnesses multi-resolution features, capturing cellular morphology at individual spots, the local context around these spots, and the global tissue organization. By integrating these features through an effective fusion strategy, TRIPLEX achieves accurate gene expression prediction. Our comprehensive benchmark study, conducted on three public ST datasets and supplemented with Visium data from 10X Genomics, demonstrates that TRIPLEX outperforms current state-of-the-art models in Mean Squared Error (MSE), Mean Absolute Error (MAE), and Pearson Correlation Coefficient (PCC). The model's predictions align closely with ground truth gene expression profiles and tumor annotations, underscoring TRIPLEX's potential in advancing cancer diagnosis and treatment.
△ Less
Submitted 25 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Scarf complexes of graphs and their powers
Authors:
Sara Faridi,
Tài Huy Hà,
Takayuki Hibi,
Susan Morey
Abstract:
Every multigraded free resolution of a monomial ideal I contains the Scarf multidegrees of I. We say I has a Scarf resolution if the Scarf multidegrees are sufficient to describe a minimal free resolution of I. The main question of this paper is which graphs G have edge ideal I(G) with a Scarf resolution? We show that I(G) has a Scarf resolution if and only if G is a gap-free forest. We also class…
▽ More
Every multigraded free resolution of a monomial ideal I contains the Scarf multidegrees of I. We say I has a Scarf resolution if the Scarf multidegrees are sufficient to describe a minimal free resolution of I. The main question of this paper is which graphs G have edge ideal I(G) with a Scarf resolution? We show that I(G) has a Scarf resolution if and only if G is a gap-free forest. We also classify connected graphs for which all powers of I(G) have Scarf resolutions. Along the way, we give a concrete description of the Scarf complex of any forest. For a general graph, we give a recursive construction for its Scarf complex based on Scarf complexes of induced subgraphs.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Competitive Facility Location under Random Utilities and Routing Constraints
Authors:
Hoang Giang Pham,
Tien Thanh Dam,
Ngan Ha Duong,
Tien Mai,
Minh Hoang Ha
Abstract:
In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantee…
▽ More
In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantees the existence of a tour visiting all chosen locations while adhering to a specified tour length upper bound. Such routing constraints find crucial applications in various real-world scenarios. The problem at hand features a non-linear objective function, resulting from the utilization of random utilities, together with complex routing constraints, making it computationally challenging. To tackle this problem, we explore three types of valid cuts, namely, outer-approximation and submodular cuts to handle the nonlinear objective function, as well as sub-tour elimination cuts to address the complex routing constraints. These lead to the development of two exact solution methods: a nested cutting plane and nested branch-and-cut algorithms, where these valid cuts are iteratively added to a master problem through two nested loops. We also prove that our nested cutting plane method always converges to optimality after a finite number of iterations. Furthermore, we develop a local search-based metaheuristic tailored for solving large-scale instances and show its pros and cons compared to exact methods. Extensive experiments are conducted on problem instances of varying sizes, demonstrating that our approach excels in terms of solution quality and computation time when compared to other baseline approaches.
△ Less
Submitted 9 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Pulse shape discrimination in an organic scintillation phoswich detector using machine learning techniques
Authors:
Yu** Lee,
**young Kim,
Byoung-cheol Koh,
Young Soo Yoon,
Chang Hyon Ha
Abstract:
We developed machine learning algorithms for distinguishing scintillation signals from a plastic-liquid coupled detector known as a phoswich. The challenge lies in discriminating signals from organic scintillators with similar shapes and short decay times. Using a single-readout phoswich detector, we successfully identified $γ$ radiation signals from two scintillating components. Our Boosted Decis…
▽ More
We developed machine learning algorithms for distinguishing scintillation signals from a plastic-liquid coupled detector known as a phoswich. The challenge lies in discriminating signals from organic scintillators with similar shapes and short decay times. Using a single-readout phoswich detector, we successfully identified $γ$ radiation signals from two scintillating components. Our Boosted Decision Tree algorithm demonstrated a maximum discrimination power of 3.02 $\pm$ 0.85 standard deviation in the 950 keV region, providing an efficient solution for self-shielding and enhancing radiation detection capabilities.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Revisiting Learning-based Video Motion Magnification for Real-time Processing
Authors:
Hyunwoo Ha,
Oh Hyun-Bin,
Kim Jun-Seong,
Kwon Byung-Ki,
Kim Sung-Bin,
Linh-Tam Tran,
Ji-Yun Kim,
Sung-Ho Bae,
Tae-Hyun Oh
Abstract:
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being e…
▽ More
Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Improving Open-Ended Text Generation via Adaptive Decoding
Authors:
Wenhong Zhu,
Hongkun Hao,
Zhiwei He,
Yiming Ai,
Rui Wang
Abstract:
Current language models decode text token by token according to probabilistic distribution, and determining the appropriate candidates for the next token is crucial to ensure generation quality. This study introduces adaptive decoding, a mechanism that dynamically empowers language models to ascertain a sensible candidate set during generation. Specifically, we introduce an entropy-based metric ca…
▽ More
Current language models decode text token by token according to probabilistic distribution, and determining the appropriate candidates for the next token is crucial to ensure generation quality. This study introduces adaptive decoding, a mechanism that dynamically empowers language models to ascertain a sensible candidate set during generation. Specifically, we introduce an entropy-based metric called confidence and conceptualize determining the optimal candidate set as a confidence-increasing process. The rationality of including a token in the candidate set is assessed by leveraging the increment of confidence. Experimental results reveal that our method balances diversity and coherence well. The human evaluation shows that our method can generate human-preferred text. Additionally, our method can potentially improve the reasoning ability of language models.
△ Less
Submitted 2 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Dynamics-Guided Diffusion Model for Robot Manipulator Design
Authors:
Xiaomeng Xu,
Huy Ha,
Shuran Song
Abstract:
We present Dynamics-Guided Diffusion Model, a data-driven framework for generating manipulator geometry designs for a given manipulation task. Instead of training different design models for each task, our approach employs a learned dynamics network shared across tasks. For a new manipulation task, we first decompose it into a collection of individual motion targets which we call target interactio…
▽ More
We present Dynamics-Guided Diffusion Model, a data-driven framework for generating manipulator geometry designs for a given manipulation task. Instead of training different design models for each task, our approach employs a learned dynamics network shared across tasks. For a new manipulation task, we first decompose it into a collection of individual motion targets which we call target interaction profile, where each individual motion can be modeled by the shared dynamics network. The design objective constructed from the target and predicted interaction profiles provides a gradient to guide the refinement of finger geometry for the task. This refinement process is executed as a classifier-guided diffusion process, where the design objective acts as the classifier guidance. We evaluate our framework on various manipulation tasks, under the sensor-less setting using only an open-loop parallel jaw motion. Our generated designs outperform optimization-based and unguided diffusion baselines relatively by 31.5% and 45.3% on average manipulation success rate. With the ability to generate a design within 0.8 seconds, our framework could facilitate rapid design iteration and enhance the adoption of data-driven approaches for robotic mechanism design.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Is Cognition and Action Consistent or Not: Investigating Large Language Model's Personality
Authors:
Yiming Ai,
Zhiwei He,
Ziyin Zhang,
Wenhong Zhu,
Hongkun Hao,
Kai Yu,
Lingjun Chen,
Rui Wang
Abstract:
In this study, we investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires. Our goal is to evaluate the consistency between LLMs' professed personality inclinations and their actual "behavior", examining the extent to which these models can emulate human-like personality patterns. Through a comprehensive…
▽ More
In this study, we investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires. Our goal is to evaluate the consistency between LLMs' professed personality inclinations and their actual "behavior", examining the extent to which these models can emulate human-like personality patterns. Through a comprehensive analysis of LLM outputs against established human benchmarks, we seek to understand the cognition-action divergence in LLMs and propose hypotheses for the observed results based on psychological theories and metrics.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models
Authors:
Zhiwei He,
Binglin Zhou,
Hongkun Hao,
Aiwei Liu,
Xing Wang,
Zhaopeng Tu,
Zhuosheng Zhang,
Rui Wang
Abstract:
Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of cross-lingual consistency in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarki…
▽ More
Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of cross-lingual consistency in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarking methods reveal that current text watermarking technologies lack consistency when texts are translated into various languages. Based on this observation, we propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking by first obtaining a response from an LLM in a pivot language, which is then translated into the target language. CWRA can effectively remove watermarks, decreasing the AUCs to a random-guessing level without performance loss. Furthermore, we analyze two key factors that contribute to the cross-lingual consistency in text watermarking and propose X-SIR as a defense method against CWRA. Code: https://github.com/zwhe99/X-SIR.
△ Less
Submitted 4 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Rational powers, invariant ideals, and the summation formula
Authors:
Sankhaneel Bisui,
Sudipta Das,
Tài Huy Hà,
Jonathan Montaño
Abstract:
We study the Rees valuations and rational powers of several classes of invariant ideals, providing explicit descriptions in terms of certain polyhedra. This allows us to show that a version of Mustaţă-Takagi's summation formula for rational powers holds for these ideals. Moreover, for arbitrary ideals over the complex numbers, we prove a weaker version of this formula that holds for high enough ra…
▽ More
We study the Rees valuations and rational powers of several classes of invariant ideals, providing explicit descriptions in terms of certain polyhedra. This allows us to show that a version of Mustaţă-Takagi's summation formula for rational powers holds for these ideals. Moreover, for arbitrary ideals over the complex numbers, we prove a weaker version of this formula that holds for high enough rational numbers. Our methods lead to a definition of rational powers of ideal sheaves in normal complex varieties inspired by work of Lejeune-Jalabert-Teissier and Lazarsfeld.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
High-dimensional Bayesian Optimization via Covariance Matrix Adaptation Strategy
Authors:
Lam Ngo,
Huong Ha,
Jeffrey Chan,
Vu Nguyen,
Hongyu Zhang
Abstract:
Bayesian Optimization (BO) is an effective method for finding the global optimum of expensive black-box functions. However, it is well known that applying BO to high-dimensional optimization problems is challenging. To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum…
▽ More
Bayesian Optimization (BO) is an effective method for finding the global optimum of expensive black-box functions. However, it is well known that applying BO to high-dimensional optimization problems is challenging. To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum, and then use BO to optimize the objective function within these regions. In this paper, we propose a novel technique for defining the local regions using the Covariance Matrix Adaptation (CMA) strategy. Specifically, we use CMA to learn a search distribution that can estimate the probabilities of data points being the global optimum of the objective function. Based on this search distribution, we then define the local regions consisting of data points with high probabilities of being the global optimum. Our approach serves as a meta-algorithm as it can incorporate existing black-box BO optimizers, such as BO, TuRBO, and BAxUS, to find the global optimum of the objective function within our derived local regions. We evaluate our proposed method on various benchmark synthetic and real-world problems. The results demonstrate that our method outperforms existing state-of-the-art techniques.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Background study of the AMoRE-pilot experiment
Authors:
A. Agrawal,
V. V. Alenkov,
P. Aryal,
J. Beyer,
B. Bhandari,
R. S. Boiko,
K. Boonin,
O. Buzanov,
C. R. Byeon,
N. Chanthima,
M. K. Cheoun,
J. S. Choe,
Seonho Choi,
S. Choudhury,
J. S. Chung,
F. A. Danevich,
M. Djamal,
D. Drung,
C. Enss,
A. Fleischmann,
A. M. Gangapshev,
L. Gastaldo,
Yu. M. Gavrilyuk,
A. M. Gezhaev,
O. Gileva
, et al. (83 additional authors not shown)
Abstract:
We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental conf…
▽ More
We report a study on the background of the Advanced Molybdenum-Based Rare process Experiment (AMoRE), a search for neutrinoless double beta decay (\znbb) of $^{100}$Mo. The pilot stage of the experiment was conducted using $\sim$1.9 kg of \CAMOO~ crystals at the Yangyang Underground Laboratory, South Korea, from 2015 to 2018. We compared the measured $β/γ$ energy spectra in three experimental configurations with the results of Monte Carlo simulations and identified the background sources in each configuration. We replaced several detector components and enhanced the neutron shielding to lower the background level between configurations. A limit on the half-life of $0νββ$ decay of $^{100}$Mo was found at $T_{1/2}^{0ν} \ge 3.0\times 10^{23}$ years at 90\% confidence level, based on the measured background and its modeling. Further reduction of the background rate in the AMoRE-I and AMoRE-II are discussed.
△ Less
Submitted 7 April, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
ORGANA: A Robotic Assistant for Automated Chemistry Experimentation and Characterization
Authors:
Kourosh Darvish,
Marta Skreta,
Yuchi Zhao,
Naruki Yoshikawa,
Sagnik Som,
Miroslav Bogdanovic,
Yang Cao,
Han Hao,
Hao** Xu,
Alán Aspuru-Guzik,
Animesh Garg,
Florian Shkurti
Abstract:
Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapt…
▽ More
Chemistry experimentation is often resource- and labor-intensive. Despite the many benefits incurred by the integration of advanced and special-purpose lab equipment, many aspects of experimentation are still manually conducted by chemists, for example, polishing an electrode in electrochemistry experiments. Traditional lab automation infrastructure faces challenges when it comes to flexibly adapting to new chemistry experiments. To address this issue, we propose a human-friendly and flexible robotic system, ORGANA, that automates a diverse set of chemistry experiments. It is capable of interacting with chemists in the lab through natural language, using Large Language Models (LLMs). ORGANA keeps scientists informed by providing timely reports that incorporate statistical analyses. Additionally, it actively engages with users when necessary for disambiguation or troubleshooting. ORGANA can reason over user input to derive experiment goals, and plan long sequences of both high-level tasks and low-level robot actions while using feedback from the visual perception of the environment. It also supports scheduling and parallel execution for experiments that require resource allocation and coordination between multiple robots and experiment stations. We show that ORGANA successfully conducts a diverse set of chemistry experiments, including solubility assessment, pH measurement, recrystallization, and electrochemistry experiments. For the latter, we show that ORGANA robustly executes a long-horizon plan, comprising 19 steps executed in parallel, to characterize the electrochemical properties of quinone derivatives, a class of molecules used in rechargeable flow batteries. Our user study indicates that ORGANA significantly improves many aspects of user experience while reducing their physical workload. More details about ORGANA can be found at https://ac-rad.github.io/organa/.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning
Authors:
Kaiyi Zhang,
Ang Lv,
Yuhan Chen,
Hansen Ha,
Tao Xu,
Rui Yan
Abstract:
In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and…
▽ More
In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and aggregates the resulting meta-gradients. These aggregated meta-gradients are then applied to the forward computation of a zero-shot query to generate the final prediction. This batch processing approach renders the LLM agnostic to the order of ICL examples. Through extensive experiments and analysis, we demonstrate that Batch-ICL consistently outperforms most permutations of ICL examples. In some cases, it even exceeds the performance of the best order for standard ICL, all while reducing the computational resources required. Furthermore, we develop a novel variant of Batch-ICL featuring multiple "epochs" of meta-optimization. This variant implicitly explores permutations of ICL examples, further enhancing ICL performance.
△ Less
Submitted 5 June, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business Clusters
Authors:
Haiyan Hao,
Yan Wang
Abstract:
Existing Spatial Interaction Models (SIMs) are limited in capturing the complex and context-aware interactions between business clusters and trade areas. To address the limitation, we propose a SIM-GAT model to predict spatiotemporal visitation flows between community business clusters and their trade areas. The model innovatively represents the integrated system of business clusters, trade areas,…
▽ More
Existing Spatial Interaction Models (SIMs) are limited in capturing the complex and context-aware interactions between business clusters and trade areas. To address the limitation, we propose a SIM-GAT model to predict spatiotemporal visitation flows between community business clusters and their trade areas. The model innovatively represents the integrated system of business clusters, trade areas, and transportation infrastructure within an urban region using a connected graph. Then, a graph-based deep learning model, i.e., Graph AttenTion network (GAT), is used to capture the complexity and interdependencies of business clusters. We developed this model with data collected from the Miami metropolitan area in Florida. We then demonstrated its effectiveness in capturing varying attractiveness of business clusters to different residential neighborhoods and across scenarios with an eXplainable AI approach. We contribute a novel method supplementing conventional SIMs to predict and analyze the dynamics of inter-connected community business clusters. The analysis results can inform data-evidenced and place-specific planning strategies hel** community business clusters better accommodate their customers across scenarios, and hence improve the resilience of community businesses.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Q-Refine: A Perceptual Quality Refiner for AI-Generated Image
Authors:
Chunyi Li,
Haoning Wu,
Zicheng Zhang,
Hongkun Hao,
Kaiwei Zhang,
Lei Bai,
Xiaohong Liu,
Xiongkuo Min,
Weisi Lin,
Guangtao Zhai
Abstract:
With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-R…
▽ More
With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge. However, uniformly refining AI-Generated Images (AIGIs) of different qualities not only limited optimization capabilities for low-quality AIGIs but also brought negative optimization to high-quality AIGIs. To address this issue, a quality-award refiner named Q-Refine is proposed. Based on the preference of the Human Visual System (HVS), Q-Refine uses the Image Quality Assessment (IQA) metric to guide the refining process for the first time, and modify images of different qualities through three adaptive pipelines. Experimental shows that for mainstream T2I models, Q-Refine can perform effective optimization to AIGIs of different qualities. It can be a general refiner to optimize AIGIs from both fidelity and aesthetic quality levels, thus expanding the application of the T2I generation models.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Authors:
Hongkun Hao,
Long Zhou,
Shujie Liu,
**yu Li,
Shujie Hu,
Rui Wang,
Furu Wei
Abstract:
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities re…
▽ More
Large language models (LLMs) have made significant advancements in natural language processing and are concurrently extending the language ability to other modalities, such as speech and vision. Nevertheless, most of the previous work focuses on prompting LLMs with perception abilities like auditory comprehension, and the effective approach for augmenting LLMs with speech synthesis capabilities remains ambiguous. In this paper, we conduct a comprehensive empirical exploration of boosting LLMs with the ability to generate speech, by combining pre-trained LLM LLaMA/OPT and text-to-speech synthesis model VALL-E. We compare three integration methods between LLMs and speech synthesis models, including directly fine-tuned LLMs, superposed layers of LLMs and VALL-E, and coupled LLMs and VALL-E using LLMs as a powerful text encoder. Experimental results show that, using LoRA method to fine-tune LLMs directly to boost the speech synthesis capability does not work well, and superposed LLMs and VALL-E can improve the quality of generated speech both in speaker similarity and word error rate (WER). Among these three methods, coupled methods leveraging LLMs as the text encoder can achieve the best performance, making it outperform original speech synthesis models with a consistently better speaker similarity and a significant (10.9%) WER reduction.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Learning-based Axial Video Motion Magnification
Authors:
Kwon Byung-Ki,
Oh Hyun-Bin,
Kim Jun-Seong,
Hyunwoo Ha,
Tae-Hyun Oh
Abstract:
Video motion magnification amplifies invisible small motions to be perceptible, which provides humans with a spatially dense and holistic understanding of small motions in the scene of interest. This is based on the premise that magnifying small motions enhances the legibility of motions. In the real world, however, vibrating objects often possess convoluted systems that have complex natural frequ…
▽ More
Video motion magnification amplifies invisible small motions to be perceptible, which provides humans with a spatially dense and holistic understanding of small motions in the scene of interest. This is based on the premise that magnifying small motions enhances the legibility of motions. In the real world, however, vibrating objects often possess convoluted systems that have complex natural frequencies, modes, and directions. Existing motion magnification often fails to improve legibility since the intricate motions still retain complex characteristics even after being magnified, which may distract us from analyzing them. In this work, we focus on improving legibility by proposing a new concept, axial motion magnification, which magnifies decomposed motions along the user-specified direction. Axial motion magnification can be applied to various applications where motions of specific axes are critical, by providing simplified and easily readable motion information. To achieve this, we propose a novel Motion Separation Module that enables to disentangle and magnify the motion representation along axes of interest. Furthermore, we build a new synthetic training dataset for the axial motion magnification task. Our proposed method improves the legibility of resulting motions along certain axes by adding a new feature: user controllability. Axial motion magnification is a more generalized concept; thus, our method can be directly adapted to the generic motion magnification and achieves favorable performance against competing methods.
△ Less
Submitted 26 March, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
MatterGen: a generative model for inorganic materials design
Authors:
Claudio Zeni,
Robert Pinsler,
Daniel Zügner,
Andrew Fowler,
Matthew Horton,
Xiang Fu,
Sasha Shysheya,
Jonathan Crabbé,
Lixin Sun,
Jake Smith,
Bichlien Nguyen,
Hannes Schulz,
Sarah Lewis,
Chin-Wei Huang,
Ziheng Lu,
Yichi Zhou,
Han Yang,
Hongxia Hao,
Jielan Li,
Ryota Tomioka,
Tian Xie
Abstract:
The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing s…
▽ More
The design of functional materials with desired properties is essential in driving technological advances in areas like energy storage, catalysis, and carbon capture. Generative models provide a new paradigm for materials design by directly generating entirely novel materials given desired property constraints. Despite recent progress, current generative models have low success rate in proposing stable crystals, or can only satisfy a very limited set of property constraints. Here, we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. To enable this, we introduce a new diffusion-based generative process that produces crystalline structures by gradually refining atom types, coordinates, and the periodic lattice. We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset. Compared to prior generative models, structures produced by MatterGen are more than twice as likely to be novel and stable, and more than 15 times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, novel materials with desired chemistry, symmetry, as well as mechanical, electronic and magnetic properties. Finally, we demonstrate multi-property materials design capabilities by proposing structures that have both high magnetic density and a chemical composition with low supply-chain risk. We believe that the quality of generated materials and the breadth of MatterGen's capabilities represent a major advancement towards creating a universal generative model for materials design.
△ Less
Submitted 29 January, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.