-
TCC in the interior of moduli space and its implications for the string landscape and cosmology
Authors:
Alek Bedroya,
Qianshu Lu,
Paul Steinhardt
Abstract:
We consider the classical Friedmann-Robertson-Walker solutions that describe a universe undergoing a transition from an accelerating expansion phase in the past to an eternal decelerating expansion phase in the future, driven by a scalar field evolving in a potential energy landscape. We show that any solution for which the accelerating phase violates the Trans-Planckian Censorship Conjecture (TCC…
▽ More
We consider the classical Friedmann-Robertson-Walker solutions that describe a universe undergoing a transition from an accelerating expansion phase in the past to an eternal decelerating expansion phase in the future, driven by a scalar field evolving in a potential energy landscape. We show that any solution for which the accelerating phase violates the Trans-Planckian Censorship Conjecture (TCC), even in the interior of moduli space, never approaches the asymptotic vacuum with zero particles. Based on the assumption that the effective field theory must be valid for the vacuum on the asymptotic boundary, as motivated by holography and string theory, we argue that (multi-field) scalar potentials with such solutions are disallowed, thus strengthening the case for TCC. In particular, the results imply a new set of complex and highly-nonlinear constraints across the entire string landscape which may make realizing inflation impossible.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Timing Recovery for Non-Orthogonal Multiple Access with Asynchronous Clock
Authors:
Qingxin Lu,
Haide Wang,
Wenxuan Mo,
Ji Zhou,
Wei** Liu,
Changyuan Yu
Abstract:
A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Let…
▽ More
A passive optical network (PON) based on non-orthogonal multiple access (NOMA) meets low latency and high capacity. In the NOMA-PON, the asynchronous clock between the strong and weak optical network units (ONUs) causes the timing error and phase noise on the signal of the weak ONU. The theoretical derivation shows that the timing error and phase noise can be independently compensated. In this Letter, we propose a timing recovery (TR) algorithm based on an absolute timing error detector (Abs TED) and a pilot-based carrier phase recovery (CPR) to eliminate the timing error and phase noise separately. An experiment for 25G NOMA-PON is set up to verify the feasibility of the proposed algorithms. The weak ONU can achieve the 20% soft-decision forward error correction limit after compensating for timing error and phase noise. In conclusion, the proposed TR and the pilot-based CPR show great potential for the NOMA-PON.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Superconductivity up to 14.2 K in MnB$_4$ under pressure
Authors:
Zhe-Ning Xiang,
Ying-Jie Zhang,
Qing Lu,
Qing Li,
Yiwen Li,
Tianheng Huang,
Yijie Zhu,
Yongze Ye,
Jian Sun,
Hai-Hu Wen
Abstract:
The discovery of superconductivity in 3$d$-transition metal compounds with strong magnetism is interesting but rare. Especially for Mn-based compounds, there exist only very limited materials that show superconductivity. Here, we report the discovery of superconductivity up to 14.2 K in a Mn-based material MnB$_4$. By applying high pressures, we found the continuous suppression of a weak insulatin…
▽ More
The discovery of superconductivity in 3$d$-transition metal compounds with strong magnetism is interesting but rare. Especially for Mn-based compounds, there exist only very limited materials that show superconductivity. Here, we report the discovery of superconductivity up to 14.2 K in a Mn-based material MnB$_4$. By applying high pressures, we found the continuous suppression of a weak insulating behavior and the occurrence of superconductivity after about 30 GPa. With further increasing pressure, $T_\text{c}$ is gradually enhanced and reaches the maximum value of about 14.2 K at 150 GPa with a Fermi-Liquid behavior in the normal states. The synchrotron X-ray diffraction data reveal the unchanged monoclinic (S.G: $P2_1/c$) symmetry but an unusual crossover of the lattice parameters $b$ and $c$. Theoretical calculations based on the electron-phonon coupling picture reveal a very low $T_\text{c}$ (less than 1 K), manifesting an exotic pairing mechanism beyond the Bardeen-Cooper-Schrieffer (BCS) theory. Our findings show a promising way to explore high $T_\text{c}$ superconductivity by combining the 3d-transition metal magnetic elements and light elements.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Analysis of short range interactions between $u/d$ quarks in the $NN$, $D_{03}$, and $D_{30}$ systems
Authors:
Qi-Fang Lü,
Yu-Bing Dong,
Peng-Nian Shen,
Zong-Ye Zhang
Abstract:
The dynamic mechanism of short range interaction between $u/d$ quarks is still an open and challenging problem. In order to reveal this quark dynamics, we perform a systematic analysis of $NN$, $D_{03}$, and $D_{30}$ systems in the (extended) chiral SU(3) constituent quark models. By comparing results calculated with different models and different parameter sets, the effects of one gluon exchange…
▽ More
The dynamic mechanism of short range interaction between $u/d$ quarks is still an open and challenging problem. In order to reveal this quark dynamics, we perform a systematic analysis of $NN$, $D_{03}$, and $D_{30}$ systems in the (extended) chiral SU(3) constituent quark models. By comparing results calculated with different models and different parameter sets, the effects of one gluon exchange and vector meson exchange terms are carefully examined. The results indicate that the vector meson exchange interactions dominate the short range interactions between $u/d$ quarks, while the small residual one gluon exchange coupling strength is also allowed.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Fairpriori: Improving Biased Subgroup Discovery for Deep Neural Network Fairness
Authors:
Kacy Zhou,
Jiawen Wen,
Nan Yang,
Dong Yuan,
Qinghua Lu,
Huaming Chen
Abstract:
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-…
▽ More
While deep learning has become a core functional module of most software systems, concerns regarding the fairness of ML predictions have emerged as a significant issue that affects prediction results due to discrimination. Intersectional bias, which disproportionately affects members of subgroups, is a prime example of this. For instance, a machine learning model might exhibit bias against darker-skinned women, while not showing bias against individuals with darker skin or women. This problem calls for effective fairness testing before the deployment of such deep learning models in real-world scenarios. However, research into detecting such bias is currently limited compared to research on individual and group fairness. Existing tools to investigate intersectional bias lack important features such as support for multiple fairness metrics, fast and efficient computation, and user-friendly interpretation. This paper introduces Fairpriori, a novel biased subgroup discovery method, which aims to address these limitations. Fairpriori incorporates the frequent itemset generation algorithm to facilitate effective and efficient investigation of intersectional bias by producing fast fairness metric calculations on subgroups of a dataset. Through comparison with the state-of-the-art methods (e.g., Themis, FairFictPlay, and TestSGD) under similar conditions, Fairpriori demonstrates superior effectiveness and efficiency when identifying intersectional bias. Specifically, Fairpriori is easier to use and interpret, supports a wider range of use cases by accommodating multiple fairness metrics, and exhibits higher efficiency in computing fairness metrics. These findings showcase Fairpriori's potential for effectively uncovering subgroups affected by intersectional bias, supported by its open-source tooling at https://anonymous.4open.science/r/Fairpriori-0320.
△ Less
Submitted 24 June, 2024;
originally announced July 2024.
-
Large Language Models Struggle in Token-Level Clinical Named Entity Recognition
Authors:
Qiuhao Lu,
Rui Li,
Andrew Wen,
**lian Wang,
Liwei Wang,
Hongfang Liu
Abstract:
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial…
▽ More
Large Language Models (LLMs) have revolutionized various sectors, including healthcare where they are employed in diverse applications. Their utility is particularly significant in the context of rare diseases, where data scarcity, complexity, and specificity pose considerable challenges. In the clinical domain, Named Entity Recognition (NER) stands out as an essential task and it plays a crucial role in extracting relevant information from clinical texts. Despite the promise of LLMs, current research mostly concentrates on document-level NER, identifying entities in a more general context across entire documents, without extracting their precise location. Additionally, efforts have been directed towards adapting ChatGPT for token-level NER. However, there is a significant research gap when it comes to employing token-level NER for clinical texts, especially with the use of local open-source LLMs. This study aims to bridge this gap by investigating the effectiveness of both proprietary and local LLMs in token-level clinical NER. Essentially, we delve into the capabilities of these models through a series of experiments involving zero-shot prompting, few-shot prompting, retrieval-augmented generation (RAG), and instruction-fine-tuning. Our exploration reveals the inherent challenges LLMs face in token-level NER, particularly in the context of rare diseases, and suggests possible improvements for their application in healthcare. This research contributes to narrowing a significant gap in healthcare informatics and offers insights that could lead to a more refined application of LLMs in the healthcare sector.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Adaptive Safe Reinforcement Learning-Enabled Optimization of Battery Fast-Charging Protocols
Authors:
Myisha A. Chowdhury,
Saif S. S. Al-Wahaibi,
Qiugang Lu
Abstract:
Optimizing charging protocols is critical for reducing battery charging time and decelerating battery degradation in applications such as electric vehicles. Recently, reinforcement learning (RL) methods have been adopted for such purposes. However, RL-based methods may not ensure system (safety) constraints, which can cause irreversible damages to batteries and reduce their lifetime. To this end,…
▽ More
Optimizing charging protocols is critical for reducing battery charging time and decelerating battery degradation in applications such as electric vehicles. Recently, reinforcement learning (RL) methods have been adopted for such purposes. However, RL-based methods may not ensure system (safety) constraints, which can cause irreversible damages to batteries and reduce their lifetime. To this end, this work proposes an adaptive and safe RL framework to optimize fast charging strategies while respecting safety constraints with a high probability. In our method, any unsafe action that the RL agent decides will be projected into a safety region by solving a constrained optimization problem. The safety region is constructed using adaptive Gaussian process (GP) models, consisting of static and dynamic GPs, that learn from online experience to adaptively account for any changes in battery dynamics. Simulation results show that our method can charge the batteries rapidly with constraint satisfaction under varying operating conditions.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Authors:
Fanqing Meng,
Wenqi Shao,
Lixin Luo,
Yahong Wang,
Yiran Chen,
Quanfeng Lu,
Yue Yang,
Tianshuo Yang,
Kaipeng Zhang,
Yu Qiao,
** Luo
Abstract:
Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal know…
▽ More
Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge, particularly physical commonsense. To address this issue, we introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties, encompassing 31 distinct physical scenarios. We assess 6 prominent T2I models, including proprietary models DALLE3 and Gemini, and demonstrate that incorporating physical principles into prompts enhances the models' ability to generate physically accurate images. Our findings reveal that: (1) even advanced models frequently err in various physical scenarios, except for optics; (2) GPT-4o, with item-specific scoring instructions, effectively evaluates the models' understanding of physical commonsense, closely aligning with human assessments; and (3) current T2I models are primarily focused on text-to-image translation, lacking profound reasoning regarding physical commonsense. We advocate for increased attention to the inherent knowledge within T2I models, beyond their utility as mere image generation tools. The code and data are available at https://github.com/OpenGVLab/PhyBench.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Triaxial shape of the one-proton emitter $^{149}$Lu
Authors:
Qi Lu,
Kai-Yuan Zhang,
Shi-Sheng Zhang
Abstract:
We revisit the proton emitter $^{149}$Lu utilizing the recently developed triaxial relativistic Hartree-Bogoliubov theory in continuum (TRHBc). By incorporating the microscopic nuclear structure properties from the TRHBc theory into the WKB approximation, we successfully reproduce the measured proton-emission half-life of $^{149}$Lu within experimental uncertainties. A triaxial ground state charac…
▽ More
We revisit the proton emitter $^{149}$Lu utilizing the recently developed triaxial relativistic Hartree-Bogoliubov theory in continuum (TRHBc). By incorporating the microscopic nuclear structure properties from the TRHBc theory into the WKB approximation, we successfully reproduce the measured proton-emission half-life of $^{149}$Lu within experimental uncertainties. A triaxial ground state characterized by ($β=0.17,γ=31^\circ$) has been clarified for $^{149}$Lu. The inclusion of triaxiality significantly changes nuclear density distributions and potentials, which results in enhanced binding of both the nuclear system and the proton-emitting orbital. As a result, a slightly extended half-life for the proton emission of $^{149}$Lu is achieved after considering triaxial deformation degrees of freedom.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Exploration on $1n$ halo nucleus $^{19}$C from D-RHFB structure to reaction observables
Authors:
Jia-Lin An,
Qi Lu,
Wen Hui Long,
Shi-Sheng Zhang
Abstract:
We utilize the axially deformed relativistic Hartree-Fock-Bogoliubov (D-RHFB) model to describe the structure of neutron-rich carbon isotopes, taking into account the continuum, pairing correlations, tensor force and their interplay. In this scheme, one- and two-neutron separation energies of neutron-rich carbon isotopes agree well with measured data, as well as the spin and parity $J^π=1/2^+$ for…
▽ More
We utilize the axially deformed relativistic Hartree-Fock-Bogoliubov (D-RHFB) model to describe the structure of neutron-rich carbon isotopes, taking into account the continuum, pairing correlations, tensor force and their interplay. In this scheme, one- and two-neutron separation energies of neutron-rich carbon isotopes agree well with measured data, as well as the spin and parity $J^π=1/2^+$ for the ground state of $^{19}$C, which is a long-standing problem for theoretical structure models. With the structure input extracted from the microscopic D-RHFB model, the reaction observables are well described the Glauber model. In particular, this unified approach accurately reproduces the inclusive longitudinal momentum distributions of the breakup reaction $^{19}$C + $^{12}$C at 240 MeV/nucleon, which rule out the possibility of the ground state of $^{19}$C being $J^π=3/2^+$. Moreover, the continuum plays a crucial role in the formation of the halo, which is further confirmed by the reaction cross sections and longitudinal momentum distributions. However, the tensor force components carried by the $π$-coupling are not as significant as anticipated. Consequently, the D-RHFB + Glauber approach turns out to be a promising tool to search for halo candidates from the structure to the reaction.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Authors:
Quanfeng Lu,
Wenqi Shao,
Zitao Liu,
Fanqing Meng,
Boxuan Li,
Botong Chen,
Siyuan Huang,
Kaipeng Zhang,
Yu Qiao,
** Luo
Abstract:
Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained with datasets compri…
▽ More
Smartphone users often navigate across multiple applications (apps) to complete tasks such as sharing content between social media platforms. Autonomous Graphical User Interface (GUI) navigation agents can enhance user experience in communication, entertainment, and productivity by streamlining workflows and reducing manual intervention. However, prior GUI agents often trained with datasets comprising simple tasks that can be completed within a single app, leading to poor performance in cross-app navigation. To address this problem, we introduce GUI Odyssey, a comprehensive dataset for training and evaluating cross-app navigation agents. GUI Odyssey consists of 7,735 episodes from 6 mobile devices, spanning 6 types of cross-app tasks, 201 apps, and 1.4K app combos. Leveraging GUI Odyssey, we developed OdysseyAgent, a multimodal cross-app navigation agent by fine-tuning the Qwen-VL model with a history resampling module. Extensive experiments demonstrate OdysseyAgent's superior accuracy compared to existing models. For instance, OdysseyAgent surpasses fine-tuned Qwen-VL and zero-shot GPT-4V by 1.44\% and 55.49\% in-domain accuracy, and 2.29\% and 48.14\% out-of-domain accuracy on average. The dataset and code will be released in \url{https://github.com/OpenGVLab/GUI-Odyssey}.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Modeling fibrous tissue in vascular fluid-structure interaction: a morphology-based pipeline and biomechanical significance
Authors:
Yujie Sun,
Jiayi Huang,
Qingshuang Lu,
Xinhai Yue,
Xuanming Huang,
Wei He,
Yun Shi,
Ju Liu
Abstract:
We propose a suite of technologies for analyzing the interaction between anisotropic arterial walls and blood flow for subject-specific geometries. Utilizing an established lumen modeling strategy, we present a comprehensive pipeline for generating the thick-walled artery models. Through a specialized mesh generation procedure, we obtain the meshes for the arterial lumen and wall with mesh continu…
▽ More
We propose a suite of technologies for analyzing the interaction between anisotropic arterial walls and blood flow for subject-specific geometries. Utilizing an established lumen modeling strategy, we present a comprehensive pipeline for generating the thick-walled artery models. Through a specialized mesh generation procedure, we obtain the meshes for the arterial lumen and wall with mesh continuity across the interface ensured. Exploiting the centerline information, a series of procedures is introduced for generating local basis vectors within the arterial wall. The procedures are tailored to handle thick-walled and, in particular, aneurysmatic tissues in which the basis vectors may exhibit transmural variations. Additionally, we propose methods to accurately identify the centerline in multi-branched vessels and bifurcating regions. The developed fiber generation method is evaluated against the strategy using linear elastic analysis, demonstrating that the proposed approach yields satisfactory fiber definitions in the considered benchmark. Finally, we examine the impact of anisotropic arterial wall models on the vascular fluid-structure interaction analysis through numerical examples. For comparison purposes, the neo-Hookean model is considered. The first case involves an idealized curved geometry, while the second case studies an image-based abdominal aorta model. The numerical results reveal that the deformation and stress distribution are critically related to the constitutive model of the wall, while the hemodynamic factors are less sensitive to the wall model. This work paves the way for more accurate image-based vascular modeling and enhances the prediction of arterial behavior under physiologically realistic conditions.
△ Less
Submitted 20 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
Global Surface Warming Caused by Shorter-term Radiative Forcings of Aerosols and Ozone in the Last Two Decades
Authors:
Qing-Bin Lu
Abstract:
Conventional climate models have predicted continuous warming on the Earth's surface and cooling in the upper stratosphere. Here we report observations of regional and global upper stratosphere temperature (UST) and surface temperature and of various climate drivers including greenhouse gases (GHGs), ozone, aerosols, solar variability, snow cover extent, and sea ice extent (SIE), combined with cal…
▽ More
Conventional climate models have predicted continuous warming on the Earth's surface and cooling in the upper stratosphere. Here we report observations of regional and global upper stratosphere temperature (UST) and surface temperature and of various climate drivers including greenhouse gases (GHGs), ozone, aerosols, solar variability, snow cover extent, and sea ice extent (SIE), combined with calculations of global mean surface temperature (GMST) by a conceptual physics model. We strikingly found warming trends of 0.8(+/-0.6) and 0.7(+/-0.2) K/decade in UST at altitudes of 35-40 km in the Arctic and Antarctic respectively and no significant trends over non-polar regions since 2002. According to the well-recognized climate models, these UST trends provide fingerprints of decreasing (no significant trends) in total GHG effect in polar (non-polar) regions. Correspondingly, we made the first observation of both surface cooling trends in the Antarctic since 2002 and the Arctic since 2016 once the SIE started to recover. But surface warming remains at mid-latitudes, which caused the recent rise in GMST. The latter is quantitatively explained by the positive short-term radiative forcings of aerosols and ozone due to improved air quality. The observed GMST changes agree well with calculated results by the physics model based on halogen-containing GHGs, whose destruction is consistent with the characteristics of the cosmic-ray-driven reaction mechanism with larger rates at higher latitudes. With observations of rapidly lowered aerosol loading, projected halogenated GHGs and stopped Arctic amplification, we predict to observe an emerging long-term GMST reversal that started at the end of 2023.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Stoichiometry-induced ferromagnetism in altermagnetic candidate MnTe
Authors:
Michael Chilcote,
Alessandro R. Mazza,
Qiangsheng Lu,
Isaiah Gray,
Qi Tian,
Qinwen Deng,
Duncan Moseley,
An-Hsi Chen,
Jason Lapano,
Jason S. Gardner,
Gyula Eres,
T. Zac Ward,
Erxi Feng,
Huibo Cao,
Valeria Lauter,
Michael A. McGuire,
Raphael Hermann,
David Parker,
Myung-Geun Han,
Asghar Kayani,
Gaurab Rimal,
Liang Wu,
Timothy R. Charlton,
Robert G. Moore,
Matthew Brahlek
Abstract:
The field of spintronics has seen a surge of interest in altermagnetism due to novel predictions and many possible applications. MnTe is a leading altermagnetic candidate that is of significant interest across spintronics due to its layered antiferromagnetic structure, high Neel temperature (TN ~ 310 K) and semiconducting properties. We present results on molecular beam epitaxy (MBE) grown MnTe/In…
▽ More
The field of spintronics has seen a surge of interest in altermagnetism due to novel predictions and many possible applications. MnTe is a leading altermagnetic candidate that is of significant interest across spintronics due to its layered antiferromagnetic structure, high Neel temperature (TN ~ 310 K) and semiconducting properties. We present results on molecular beam epitaxy (MBE) grown MnTe/InP(111) films. Here, it is found that the electronic and magnetic properties are driven by the natural stoichiometry of MnTe. Electronic transport and in situ angle-resolved photoemission spectroscopy show the films are natively metallic with the Fermi level in the valence band and the band structure is in good agreement with first principles calculations for altermagnetic spin-splitting. Neutron diffraction confirms that the film is antiferromagnetic with planar anisotropy and polarized neutron reflectometry indicates weak ferromagnetism, which is linked to a slight Mn-richness that is intrinsic to the MBE grown samples. When combined with the anomalous Hall effect, this work shows that the electronic response is strongly affected by the ferromagnetic moment. Altogether, this highlights potential mechanisms for controlling altermagnetic ordering for diverse spintronic applications.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models
Authors:
Zejun Zhang,
Zhenchang Xing,
Xiaoxue Ren,
Qinghua Lu,
Xiwei Xu
Abstract:
Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptab…
▽ More
Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting a rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks, but we also invoke Analytic Rule Interfaces (ARIs) to accomplish tasks. The ARIs are Python code generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an ARI library for subsequent use. After that, for any syntax-error-free Python code, we invoke ARIs from the ARI library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke ARIs from the ARI library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Searching Priors Makes Text-to-Video Synthesis Better
Authors:
Haoran Cheng,
Liang Peng,
Linxuan Xia,
Yuepeng Hu,
Hengjia Li,
Qinglin Lu,
Xiaofei He,
Boxi Wu
Abstract:
Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this…
▽ More
Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this problem, in this paper, we reformulate the typical T2V generation process as a search-based generation pipeline. Instead of scaling up the model training, we employ existing videos as the motion prior database. Specifically, we divide T2V generation process into two steps: (i) For a given prompt input, we search existing text-video datasets to find videos with text labels that closely match the prompt motions. We propose a tailored search algorithm that emphasizes object motion features. (ii) Retrieved videos are processed and distilled into motion priors to fine-tune a pre-trained base T2V model, followed by generating desired videos using input prompt. By utilizing the priors gleaned from the searched videos, we enhance the realism of the generated videos' motion. All operations can be finished on a single NVIDIA RTX 4090 GPU. We validate our method against state-of-the-art T2V models across diverse prompt inputs. The code will be public.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency
Authors:
Haoran Ma,
Yifan Qiao,
Shi Liu,
Shan Yu,
Yuanjiang Ni,
Qingda Lu,
Jiesheng Wu,
Yiying Zhang,
Miryung Kim,
Harry Xu
Abstract:
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunit…
▽ More
Despite being a powerful concept, distributed shared memory (DSM) has not been made practical due to the extensive synchronization needed between servers to implement memory coherence. This paper shows a practical DSM implementation based on the insight that the ownership model embedded in programming languages such as Rust automatically constrains the order of read and write, providing opportunities for significantly simplifying the coherence implementation if the ownership semantics can be exposed to and leveraged by the runtime. This paper discusses the design and implementation of DistR, a Rust-based DSM system that outperforms the two state-of-the-art DSM systems GAM and Grappa by up to 2.64x and 29.16x in throughput, and scales much better with the number of servers.
△ Less
Submitted 27 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Task-Agnostic Machine Learning-Assisted Inference
Authors:
Jiacheng Miao,
Qiongshi Lu
Abstract:
Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened up a whole new field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science…
▽ More
Machine learning (ML) is playing an increasingly important role in scientific research. In conjunction with classical statistical approaches, ML-assisted analytical strategies have shown great promise in accelerating research findings. This has also opened up a whole new field of methodological research focusing on integrative approaches that leverage both ML and statistics to tackle data science challenges. One type of study that has quickly gained popularity employs ML to predict unobserved outcomes in massive samples and then uses the predicted outcomes in downstream statistical inference. However, existing methods designed to ensure the validity of this type of post-prediction inference are limited to very basic tasks such as linear regression analysis. This is because any extension of these approaches to new, more sophisticated statistical tasks requires task-specific algebraic derivations and software implementations, which ignores the massive library of existing software tools already developed for complex inference tasks and severely constrains the scope of post-prediction inference in real applications. To address this challenge, we propose a novel statistical framework for task-agnostic ML-assisted inference. It provides a post-prediction inference solution that can be easily plugged into almost any established data analysis routine. It delivers valid and efficient inference that is robust to arbitrary choices of ML models, while allowing nearly all existing analytical frameworks to be incorporated into the analysis of ML-predicted outcomes. Through extensive experiments, we showcase the validity, versatility, and superiority of our method compared to existing approaches.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Enabling Generative Design Tools with LLM Agents for Building Novel Devices: A Case Study on Fluidic Computation Interfaces
Authors:
Qiuyu Lu,
Jiawei Fang,
Zhihao Yao,
Yue Yang,
Shiqing Lyu,
Haipeng Mi,
Lining Yao
Abstract:
In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototy** process for such cutting-edge devices. While these tools simplify the process through parametric design and simulatio…
▽ More
In the field of Human-Computer Interaction (HCI), the development of interactive devices represents a significant area of focus. The advent of novel hardware and advanced fabrication techniques has underscored the demand for specialized design tools that democratize the prototy** process for such cutting-edge devices. While these tools simplify the process through parametric design and simulation, they typically require a certain learning curve and often fall short in facilitating creative ideation. In this study, we employ fluidic computation interface as a case study to investigate the potential of augmenting design tools of physical devices with Large Language Model (LLM) agents. Enhanced by LLM agents, the Generative Design Tool (GDT) can comprehend the capabilities and limitations of newly developed devices; it can propose varied, insightful, and practical application scenarios, and recommend device designs that are technically and contextually appropriate. Furthermore, it generates the necessary design parameters for the traditional part of the design tool to visualize results and produce support files for fabrication. This paper outlines the GDT's framework, implementation, and performance, while also contemplating its prospects and the obstacles encountered.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Interfacially enhanced superconductivity in Fe(Te,Se)/Bi4Te3 heterostructures
Authors:
An-Hsi Chen,
Qiangsheng Lu,
Eitan Hershkovitz,
Miguel L. Crespillo,
Alessandro R. Mazza,
Tyler Smith,
T. Zac Ward,
Gyula Eres,
Shornam Gandhi,
Meer Muhtasim Mahfuz,
Vitalii Starchenko,
Khalid Hattar,
Joon Sue Lee,
Honggyu Kim,
Robert G. Moore,
Matthew Brahlek
Abstract:
Realizing topological superconductivity by integrating high-transition-temperature ($T_C$) superconductors with topological insulators can open new paths for quantum computing applications. Here, we report a new approach for increasing the superconducting transition temperature ($T_{C}^{onset}$) by interfacing the unconventional superconductor Fe(Te,Se) with the topological insulator Bi-Te system…
▽ More
Realizing topological superconductivity by integrating high-transition-temperature ($T_C$) superconductors with topological insulators can open new paths for quantum computing applications. Here, we report a new approach for increasing the superconducting transition temperature ($T_{C}^{onset}$) by interfacing the unconventional superconductor Fe(Te,Se) with the topological insulator Bi-Te system in the low-Se do** regime, near where superconductivity vanishes in the bulk. The critical finding is that the $T_{C}^{onset}$ of Fe(Te,Se) increases from nominally non-superconducting to as high as 12.5 K when $Bi_2Te_3$ is replaced with the topological phase $Bi_4Te_3$. Interfacing Fe(Te,Se) with $Bi_4Te_3$ is also found to be critical for stabilizing superconductivity in monolayer films where $T_{C}^{onset}$ can be as high as 6 K. Measurements of the electronic and crystalline structure of the $Bi_4Te_3$ layer reveal that a large electron transfer, epitaxial strain, and novel chemical reduction processes are critical factors for the enhancement of superconductivity. This novel route for enhancing $T_C$ in an important epitaxial system provides new insight on the nature of interfacial superconductivity and a platform to identify and utilize new electronic phases.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents
Authors:
Yue Liu,
Sin Kit Lo,
Qinghua Lu,
Liming Zhu,
Dehai Zhao,
Xiwei Xu,
Stefan Harrer,
Jon Whittle
Abstract:
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking…
▽ More
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 17 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
△ Less
Submitted 24 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Authors:
Zhimin Li,
Jianwei Zhang,
Qin Lin,
Jiangfeng Xiong,
Yanxin Long,
Xinchi Deng,
Yingfang Zhang,
Xingchao Liu,
Minbin Huang,
Zedong Xiao,
Dayou Chen,
Jiajun He,
Jiahao Li,
Wenyue Li,
Chen Zhang,
Rongwei Quan,
Jianxiang Lu,
Jiabin Huang,
Xiaoyan Yuan,
Xiaoxiao Zheng,
Yixuan Li,
Jihong Zhang,
Chao Zhang,
Meng Chen,
Jie Liu
, et al. (20 additional authors not shown)
Abstract:
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu…
▽ More
We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
The coupled-channel analysis of $B^{(*)}_{(s)}\bar{B}^{(*)}_{(s)}$ systems within complex scaling method
Authors:
Qing-Fu Song,
Qi-Fang Lü,
Dian-Yong Chen,
Yu-Bing Dong
Abstract:
In present work, we perform a coupled-channel analysis of $B^{(*)}_{(s)}\bar{B}^{(*)}_{(s)}$ systems with the one-boson-exchange potentials. We first study the $I(J^{PC})=1(1^{+-})$ $B\bar{B}^{*}/B^{*}\bar{B}^{*}$ system to describe the $Z_{b}(10610)$ and $Z_{b}(10650)$ particles as molecular states and determine the reasonable range of cutoff parameter $Λ$. Then, other…
▽ More
In present work, we perform a coupled-channel analysis of $B^{(*)}_{(s)}\bar{B}^{(*)}_{(s)}$ systems with the one-boson-exchange potentials. We first study the $I(J^{PC})=1(1^{+-})$ $B\bar{B}^{*}/B^{*}\bar{B}^{*}$ system to describe the $Z_{b}(10610)$ and $Z_{b}(10650)$ particles as molecular states and determine the reasonable range of cutoff parameter $Λ$. Then, other $B^{(*)}_{(s)}\bar{B}^{(*)}_{(s)}$ combinations with different quantum numbers are systematically investigated. Some bound states and resonances appear in the isoscalar systems, while only several shallow bound states exist for isovector systems. Far away from the excited conventional $P-$wave bottomium, these predicted states can be easily identified as exotic particles both theoretically and experimentally. Moreover, the $η_b(nS)/Υ(nS)$ plus light mesons are the excellent final states to search for the bound states, while the $B\bar B^*+h.c.$ and $B^* \bar B^*$ channels are suitable for the resonances. We highly recommend that the LHCb and Belle II Collaborations can hunt for these bottomonium-like states in future.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform
Authors:
Shimian Zhang,
Qiuhong Lu
Abstract:
In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-…
▽ More
In the rapidly advancing field of robotics, the fusion of state-of-the-art visual technologies with mobile robotic arms has emerged as a critical integration. This paper introduces a novel system that combines the Segment Anything model (SAM) -- a transformer-based visual foundation model -- with a robotic arm on a mobile platform. The design of integrating a depth camera on the robotic arm's end-effector ensures continuous object tracking, significantly mitigating environmental uncertainties. By deploying on a mobile platform, our gras** system has an enhanced mobility, playing a key role in dynamic environments where adaptability are critical. This synthesis enables dynamic object segmentation, tracking, and gras**. It also elevates user interaction, allowing the robot to intuitively respond to various modalities such as clicks, drawings, or voice commands, beyond traditional robotic systems. Empirical assessments in both simulated and real-world demonstrate the system's capabilities. This configuration opens avenues for wide-ranging applications, from industrial settings, agriculture, and household tasks, to specialized assignments and beyond.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Canonical interpretation of the newly observed $J^P =1^+$ structure $X(2085)$
Authors:
Tian-Ge Li,
Sheng-Chao Zhang,
Guan-Ying Wang,
Qi-Fang Lü
Abstract:
Inspired by the newly observed $X(2085)$ by the BESIII Collaboration, we study the strong decay behaviors of excited axialvector strange mesons within the quark pair creation model. Our results indicate that the $K_1(1793)/K_1(1861)$ can be regarded as the same $K_1(2P)$ state, and the $K_1(1911)$ is assigned as the $K_1(2P^\prime)$ state. Considering the mass, spin-parity, and decay behaviors, we…
▽ More
Inspired by the newly observed $X(2085)$ by the BESIII Collaboration, we study the strong decay behaviors of excited axialvector strange mesons within the quark pair creation model. Our results indicate that the $K_1(1793)/K_1(1861)$ can be regarded as the same $K_1(2P)$ state, and the $K_1(1911)$ is assigned as the $K_1(2P^\prime)$ state. Considering the mass, spin-parity, and decay behaviors, we interpret the newly observed $X(2085)$ as the radially excited $K_1(3P)$ state, which mainly decays into the $ρ(1450) K$, $ω(1420)K$, $πK^*(1410)$, $ρK_1(1270)$, and $ρK^*(892)$ final states. Also, the width of $K_1(3P^\prime)$ state is predicted to be about 300 MeV, which can be searched for by future experiments. We expect that present calculations can help us to better understand the nature of the $X(2085)$ structure.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Authors:
Kaining Ying,
Fanqing Meng,
** Wang,
Zhiqian Li,
Han Lin,
Yue Yang,
Hao Zhang,
Wenbo Zhang,
Yuqi Lin,
Shuo Liu,
Jiayi Lei,
Quanfeng Lu,
Runjian Chen,
Peng Xu,
Renrui Zhang,
Haozhe Zhang,
Peng Gao,
Yali Wang,
Yu Qiao,
** Luo,
Kaipeng Zhang,
Wenqi Shao
Abstract:
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to…
▽ More
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks testing rudimentary capabilities, falling short in tracking LVLM development. In this study, we present MMT-Bench, a comprehensive benchmark designed to assess LVLMs across massive multimodal tasks requiring expert knowledge and deliberate visual recognition, localization, reasoning, and planning. MMT-Bench comprises $31,325$ meticulously curated multi-choice visual questions from various multimodal scenarios such as vehicle driving and embodied navigation, covering $32$ core meta-tasks and $162$ subtasks in multimodal understanding. Due to its extensive task coverage, MMT-Bench enables the evaluation of LVLMs using a task map, facilitating the discovery of in- and out-of-domain tasks. Evaluation results involving $30$ LVLMs such as the proprietary GPT-4V, GeminiProVision, and open-sourced InternVL-Chat, underscore the significant challenges posed by MMT-Bench. We anticipate that MMT-Bench will inspire the community to develop next-generation multimodal foundation models aimed at achieving general-purpose multimodal intelligence.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
QCD topology and axion properties in an isotropic hot and dense medium
Authors:
Hong-Fang Gong,
Qi Lu,
Zhen-Yan Lu,
Lu-Meng Liu,
Xun Chen,
Shu-Peng Wang
Abstract:
We study the QCD topology and axion properties at finite temperature and chemical potential in the framework of the two-flavor Nambu$-$Jona-Lasinio model. We find that the behaviors of the two lowest cumulants of the QCD topological charge distribution and axion properties are highly sensitive to the critical behavior of the chiral phase transition. In particular, the topological susceptibility an…
▽ More
We study the QCD topology and axion properties at finite temperature and chemical potential in the framework of the two-flavor Nambu$-$Jona-Lasinio model. We find that the behaviors of the two lowest cumulants of the QCD topological charge distribution and axion properties are highly sensitive to the critical behavior of the chiral phase transition. In particular, the topological susceptibility and the axion mass follow the response of the chiral condensate to temperature and chemical potential, showing that both quantities decrease monotonically with the increment of temperature and/or chemical potential. However, it is important to note that the normalized fourth cumulant behaves differently depending on the temperature. At low temperatures, it is a non-monotonic function of the chemical potential, while at high temperatures, it monotonically decreases. Additionally, its value invariably approaches the asymptotic value of $b_2^{\text {inst }}=-1/12$, predicted by the dilute instanton gas model. We also observe that with the increase in chemical potential at relatively low temperatures, the axion self-coupling constant exhibits a sharp peak around the critical point, which can even be more than twice its vacuum value. After that, the self-coupling drops sharply to a much lower value than its vacuum value, eventually approaching zero in the high chemical potential limit. The finding that the axion self-coupling constant is significantly enhanced in high-density environments near the chiral phase transition could lead to the creation or enhancement of an axion Bose-Einstein condensate in compact astrophysical objects.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
GCEPNet: Graph Convolution-Enhanced Expectation Propagation for Massive MIMO Detection
Authors:
Qincheng Lu,
Sitao Luan,
Xiao-Wen Chang
Abstract:
Massive MIMO (multiple-input multiple-output) detection is an important topic in wireless communication and various machine learning based methods have been developed recently for this task. Expectation propagation (EP) and its variants are widely used for MIMO detection and have achieved the best performance. However, EP-based solvers fail to capture the correlation between unknown variables, lea…
▽ More
Massive MIMO (multiple-input multiple-output) detection is an important topic in wireless communication and various machine learning based methods have been developed recently for this task. Expectation propagation (EP) and its variants are widely used for MIMO detection and have achieved the best performance. However, EP-based solvers fail to capture the correlation between unknown variables, leading to loss of information, and in addition, they are computationally expensive. In this paper, we show that the real-valued system can be modeled as spectral signal convolution on graph, through which the correlation between unknown variables can be captured. Based on this analysis, we propose graph convolution-enhanced expectation propagation (GCEPNet), a graph convolution-enhanced EP detector. GCEPNet incorporates data-dependent attention scores into Chebyshev polynomial for powerful graph convolution with better generalization capacity. It enables a better estimation of the cavity distribution for EP and empirically achieves the state-of-the-art (SOTA) MIMO detection performance with much faster inference speed. To our knowledge, we are the first to shed light on the connection between the system model and graph convolution, and the first to design the data-dependent attention scores for graph convolution.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Prove Symbolic Regression is NP-hard by Symbol Graph
Authors:
**glu Song,
Qiang Lu,
Bozhou Tian,
**gwen Zhang,
Jake Luo,
Zhiguang Wang
Abstract:
Symbolic regression (SR) is the task of discovering a symbolic expression that fits a given data set from the space of mathematical expressions. Despite the abundance of research surrounding the SR problem, there's a scarcity of works that confirm its NP-hard nature. Therefore, this paper introduces the concept of a symbol graph as a comprehensive representation of the entire mathematical expressi…
▽ More
Symbolic regression (SR) is the task of discovering a symbolic expression that fits a given data set from the space of mathematical expressions. Despite the abundance of research surrounding the SR problem, there's a scarcity of works that confirm its NP-hard nature. Therefore, this paper introduces the concept of a symbol graph as a comprehensive representation of the entire mathematical expression space, effectively illustrating the NP-hard characteristics of the SR problem. Leveraging the symbol graph, we establish a connection between the SR problem and the task of identifying an optimally fitted degree-constrained Steiner Arborescence (DCSAP). The complexity of DCSAP, which is proven to be NP-hard, directly implies the NP-hard nature of the SR problem.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
An Origami-Inspired Variable Friction Surface for Increasing the Dexterity of Robotic Grippers
Authors:
Qiujie Lu,
Angus B. Clark,
Matthew Shen,
Nicolas Rojas
Abstract:
While the gras** capability of robotic grippers has shown significant development, the ability to manipulate objects within the hand is still limited. One explanation for this limitation is the lack of controlled contact variation between the grasped object and the gripper. For instance, human hands have the ability to firmly grip object surfaces, as well as slide over object faces, an aspect th…
▽ More
While the gras** capability of robotic grippers has shown significant development, the ability to manipulate objects within the hand is still limited. One explanation for this limitation is the lack of controlled contact variation between the grasped object and the gripper. For instance, human hands have the ability to firmly grip object surfaces, as well as slide over object faces, an aspect that aids the enhanced manipulation of objects within the hand without losing contact. In this letter, we present a parametric, origami-inspired thin surface capable of transitioning between a high friction and a low friction state, suitable for implementation as an epidermis in robotic fingers. A numerical analysis of the proposed surface based on its design parameters, force analysis, and performance in in-hand manipulation tasks is presented. Through the development of a simple two-fingered two-degree-of-freedom gripper utilizing the proposed variable-friction surfaces with different parameters, we experimentally demonstrate the improved manipulation capabilities of the hand when compared to the same gripper without changeable friction. Results show that the pattern density and valley gap are the main parameters that effect the in-hand manipulation performance. The origami-inspired thin surface with a higher pattern density generated a smaller valley gap and smaller height change, producing a more stable improvement of the manipulation capabilities of the hand.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Demand Private Coded Caching: the Two-File Case
Authors:
Qinyi Lu,
Nan Liu,
Wei Kang
Abstract:
We investigate the demand private coded caching problem, which is an $(N,K)$ coded caching problem with $N$ files, $K$ users, each equipped with a cache of size $M$, and an additional privacy constraint on user demands. We first present a new virtual-user-based achievable scheme for arbitrary number of users and files. Then, for the case of 2 files and arbitrary number of users, we derive some new…
▽ More
We investigate the demand private coded caching problem, which is an $(N,K)$ coded caching problem with $N$ files, $K$ users, each equipped with a cache of size $M$, and an additional privacy constraint on user demands. We first present a new virtual-user-based achievable scheme for arbitrary number of users and files. Then, for the case of 2 files and arbitrary number of users, we derive some new converse bounds. As a result, we obtain the exact memory-rate tradeoff of the demand private coded caching problem for 2 files and 3 users. As for the case of 2 files and arbitrary number of users, the exact memory-rate tradeoff is characterized for $M\in [0,\frac{2}{K}] \cup [\frac{2(K-1)}{K+1},2]$.
△ Less
Submitted 6 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Map**
Authors:
Boming Xia,
Qinghua Lu,
Liming Zhu,
Zhenchang Xing
Abstract:
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct…
▽ More
The advent of advanced AI underscores the urgent need for comprehensive safety evaluations, necessitating collaboration across communities (i.e., AI, software engineering, and governance). However, divergent practices and terminologies across these communities, combined with the complexity of AI systems-of which models are only a part-and environmental affordances (e.g., access to tools), obstruct effective communication and comprehensive evaluation. This paper proposes a framework for AI system evaluation comprising three components: 1) harmonised terminology to facilitate communication across communities involved in AI safety evaluation; 2) a taxonomy identifying essential elements for AI system evaluation; 3) a map** between AI lifecycle, stakeholders, and requisite evaluations for accountable AI supply chain. This framework catalyses a deeper discourse on AI system evaluation beyond model-centric approaches.
△ Less
Submitted 15 May, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Search for a sub-eV sterile neutrino using Daya Bay's full dataset
Authors:
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding,
Y. Y. Ding
, et al. (176 additional authors not shown)
Abstract:
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis…
▽ More
This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$.
△ Less
Submitted 15 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Sha** a Surface Microdroplet by Marangoni Forces along a Moving Contact Line of Four Immiscible Phases
Authors:
Haichang Yang,
Binglin Zeng,
Qiuyun Lu,
Yaowen Xing,
Xiahui Gui,
Yijun Cao,
Ben Bin Xu,
Xuehua Zhang
Abstract:
The ability to transfer microdroplets between fluid phases offers numerous advantages in various fields, enabling better control, manipulation, and utilization of small volumes of fluids in pharmaceutical formulations, microfluidics, and lab-on-a-chip devices, single-cell analysis or droplet-based techniques for nanomaterial synthesis. This study focuses on the stability and morphology of a sessil…
▽ More
The ability to transfer microdroplets between fluid phases offers numerous advantages in various fields, enabling better control, manipulation, and utilization of small volumes of fluids in pharmaceutical formulations, microfluidics, and lab-on-a-chip devices, single-cell analysis or droplet-based techniques for nanomaterial synthesis. This study focuses on the stability and morphology of a sessile oil microdroplet at the four-phase contact line of solid-water-oil-air during the droplet transfer from underwater to air. We observed a distinct transition in microdroplet dynamics, characterized by a shift from a scenario dominated by Marangoni forces to one dominated by capillary forces. In the regime dominated by Marangoni forces, the oil microdroplets spread in response to the contact between the water-air interface and the water-oil interface and the emergence of an oil concentration gradient along the water-air interface. The spreading distance along the four-phase contact line follows a power law relationship of $t^{3/4}$, reflecting the balance between Marangoni forces and viscous forces. On the other hand, in the capillarity-dominated regime, the oil microdroplets remain stable at the contact line and after being transferred into the air. We identify the crossover between these two regimes in the parameter space defined by three factors: the approaching velocity of the solid-water-air contact line ($v_{cl}$), the radius of the oil microdroplet ($r_o$), and the radius of the water drop ($r_w$). Furthermore, we demonstrate how to use the four-phase contact line for sha** oil microdroplets using a full liquid process by the contact line lithography. The findings in this study may be also applied to materials synthesis where nanoparticles, microspheres, or nanocapsules are produced by microdroplet-based techniques.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Accurately Predicting Probabilities of Safety-Critical Rare Events for Intelligent Systems
Authors:
Ruoxuan Bai,
**gxuan Yang,
Weiduo Gong,
Yi Zhang,
Qiu**g Lu,
Shuo Feng
Abstract:
Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality…
▽ More
Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality arises from the extreme data imbalance caused by rare events in high dimensional variables associated with the rare events, a challenge we refer to as the curse of rarity. Existing methods tend to be either overly conservative or prone to overlooking safety-critical events, thus struggling to achieve both high precision and recall rates, which severely limits their applicability. This study endeavors to develop a criticality prediction model that excels in both precision and recall rates for evaluating the criticality of safety-critical autonomous systems. We propose a multi-stage learning framework designed to progressively densify the dataset, mitigating the curse of rarity across stages. To validate our approach, we evaluate it in two cases: lunar lander and bipedal walker scenarios. The results demonstrate that our method surpasses traditional approaches, providing a more accurate and dependable assessment of criticality in intelligent systems.
△ Less
Submitted 5 April, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation
Authors:
Zhigang Chen,
Benjia Zhou,
Jun Li,
Jun Wan,
Zhen Lei,
Ning Jiang,
Quan Lu,
Guoqing Zhao
Abstract:
Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and ine…
▽ More
Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and inefficient use of the powerful Large Language Model (LLM). Most seriously, we find that directly introducing LLM into SLT will lead to insufficient learning of visual representations as LLM dominates the learning curve. To address these problems, we propose Factorized Learning assisted with Large Language Model (FLa-LLM) for gloss-free SLT. Concretely, we factorize the training process into two stages. In the visual initialing stage, we employ a lightweight translation model after the visual encoder to pre-train the visual encoder. In the LLM fine-tuning stage, we freeze the acquired knowledge in the visual encoder and integrate it with a pre-trained LLM to inspire the LLM's translation potential. This factorized training strategy proves to be highly effective as evidenced by significant improvements achieved across three SLT datasets which are all conducted under the gloss-free setting.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models
Authors:
Yang Yang,
Wen Wang,
Liang Peng,
Chaotian Song,
Yao Chen,
Hengjia Li,
Xiaolong Yang,
Qinglin Lu,
Deng Cai,
Boxi Wu,
Wei Liu
Abstract:
Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method f…
▽ More
Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method faces two major challenges: 1) concept confusion, where the model struggles to preserve distinct individual characteristics, and 2) concept vanishing, where the model fails to generate the intended subjects. To address these issues, we introduce LoRA-Composer, a training-free framework designed for seamlessly integrating multiple LoRAs, thereby enhancing the harmony among different concepts within generated images. LoRA-Composer addresses concept vanishing through concept injection constraints, enhancing concept visibility via an expanded cross-attention mechanism. To combat concept confusion, concept isolation constraints are introduced, refining the self-attention computation. Furthermore, latent re-initialization is proposed to effectively stimulate concept-specific latent within designated regions. Our extensive testing showcases a notable enhancement in LoRA-Composer's performance compared to standard baselines, especially when eliminating the image-based conditions like canny edge or pose estimations. Code is released at \url{https://github.com/Young98CN/LoRA_Composer}
△ Less
Submitted 10 July, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Authors:
Minbin Huang,
Yanxin Long,
Xinchi Deng,
Ruihang Chu,
Jiangfeng Xiong,
Xiaodan Liang,
Hong Cheng,
Qinglin Lu,
Wei Liu
Abstract:
Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language M…
▽ More
Text-to-image (T2I) generation models have significantly advanced in recent years. However, effective interaction with these models is challenging for average users due to the need for specialized prompt engineering knowledge and the inability to perform multi-turn image generation, hindering a dynamic and iterative creation process. Recent attempts have tried to equip Multi-modal Large Language Models (MLLMs) with T2I models to bring the user's natural language instructions into reality. Hence, the output modality of MLLMs is extended, and the multi-turn generation quality of T2I models is enhanced thanks to the strong multi-modal comprehension ability of MLLMs. However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper. Therefore, we propose DialogGen, an effective pipeline to align off-the-shelf MLLMs and T2I models to build a Multi-modal Interactive Dialogue System (MIDS) for multi-turn Text-to-Image generation. It is composed of drawing prompt alignment, careful training data curation, and error correction. Moreover, as the field of MIDS flourishes, comprehensive benchmarks are urgently needed to evaluate MIDS fairly in terms of output modality correctness and multi-modal output coherence. To address this issue, we introduce the Multi-modal Dialogue Benchmark (DialogBen), a comprehensive bilingual benchmark designed to assess the ability of MLLMs to generate accurate and coherent multi-modal content that supports image editing. It contains two evaluation metrics to measure the model's ability to switch modalities and the coherence of the output images. Our extensive experiments on DialogBen and user study demonstrate the effectiveness of DialogGen compared with other State-of-the-Art models.
△ Less
Submitted 3 July, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
UDCR: Unsupervised Aortic DSA/CTA Rigid Registration Using Deep Reinforcement Learning and Overlap Degree Calculation
Authors:
Wentao Liu,
Bowen Liang,
Wei** Xu,
Tong Tian,
Qingsheng Lu,
Xipeng Pan,
Haoyuan Li,
Siyu Tian,
Huihua Yang,
Ruisheng Su
Abstract:
The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual…
▽ More
The rigid registration of aortic Digital Subtraction Angiography (DSA) and Computed Tomography Angiography (CTA) can provide 3D anatomical details of the vasculature for the interventional surgical treatment of conditions such as aortic dissection and aortic aneurysms, holding significant value for clinical research. However, the current methods for 2D/3D image registration are dependent on manual annotations or synthetic data, as well as the extraction of landmarks, which is not suitable for cross-modal registration of aortic DSA/CTA. In this paper, we propose an unsupervised method, UDCR, for aortic DSA/CTA rigid registration based on deep reinforcement learning. Leveraging the imaging principles and characteristics of DSA and CTA, we have constructed a cross-dimensional registration environment based on spatial transformations. Specifically, we propose an overlap degree calculation reward function that measures the intensity difference between the foreground and background, aimed at assessing the accuracy of registration between segmentation maps and DSA images. This method is highly flexible, allowing for the loading of pre-trained models to perform registration directly or to seek the optimal spatial transformation parameters through online learning. We manually annotated 61 pairs of aortic DSA/CTA for algorithm evaluation. The results indicate that the proposed UDCR achieved a Mean Absolute Error (MAE) of 2.85 mm in translation and 4.35° in rotation, showing significant potential for clinical applications.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Image-Guided Autonomous Guidewire Navigation in Robot-Assisted Endovascular Interventions using Reinforcement Learning
Authors:
Wentao Liu,
Tong Tian,
Wei** Xu,
Bowen Liang,
Qingsheng Lu,
Xipeng Pan,
Wenyi Zhao,
Huihua Yang,
Ruisheng Su
Abstract:
Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAG…
▽ More
Autonomous robots in endovascular interventions possess the potential to navigate guidewires with safety and reliability, while reducing human error and shortening surgical time. However, current methods of guidewire navigation based on Reinforcement Learning (RL) depend on manual demonstration data or magnetic guidance. In this work, we propose an Image-guided Autonomous Guidewire Navigation (IAGN) method. Specifically, we introduce BDA-star, a path planning algorithm with boundary distance constraints, for the trajectory planning of guidewire navigation. We established an IAGN-RL environment where the observations are real-time guidewire feeding images highlighting the position of the guidewire tip and the planned path. We proposed a reward function based on the distances from both the guidewire tip to the planned path and the target to evaluate the agent's actions. Furthermore, in policy network, we employ a pre-trained convolutional neural network to extract features, mitigating stability issues and slow convergence rates associated with direct learning from raw pixels. Experiments conducted on the aortic simulation IAGN platform demonstrated that the proposed method, targeting the left subclavian artery and the brachiocephalic artery, achieved a 100% guidewire navigation success rate, along with reduced movement and retraction distances and trajectories tend to the center of the vessels.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Representation Learning on Heterophilic Graph with Directional Neighborhood Attention
Authors:
Qincheng Lu,
Jiaqi Zhu,
Sitao Luan,
Xiao-Wen Chang
Abstract:
Graph Attention Network (GAT) is one of the most popular Graph Neural Network (GNN) architecture, which employs the attention mechanism to learn edge weights and has demonstrated promising performance in various applications. However, since it only incorporates information from immediate neighborhood, it lacks the ability to capture long-range and global graph information, leading to unsatisfactor…
▽ More
Graph Attention Network (GAT) is one of the most popular Graph Neural Network (GNN) architecture, which employs the attention mechanism to learn edge weights and has demonstrated promising performance in various applications. However, since it only incorporates information from immediate neighborhood, it lacks the ability to capture long-range and global graph information, leading to unsatisfactory performance on some datasets, particularly on heterophilic graphs. To address this limitation, we propose the Directional Graph Attention Network (DGAT) in this paper. DGAT is able to combine the feature-based attention with the global directional information extracted from the graph topology. To this end, a new class of Laplacian matrices is proposed which can provably reduce the diffusion distance between nodes. Based on the new Laplacian, topology-guided neighbour pruning and edge adding mechanisms are proposed to remove the noisy and capture the helpful long-range neighborhood information. Besides, a global directional attention is designed to enable a topological-aware information propagation. The superiority of the proposed DGAT over the baseline GAT has also been verified through experiments on real-world benchmarks and synthetic data sets. It also outperforms the state-of-the-art (SOTA) models on 6 out of 7 real-world benchmark datasets.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Two mass-imbalanced atoms in a hard-wall trap: Deep learning integrability of many-body systems
Authors:
Liheng Lang,
Qichen Lu,
C. M. Dai,
Xingbo Wei,
Yanxia Liu,
Yunbo Zhang
Abstract:
The study of integrable systems has led to significant advancements in our understanding of many-body physics. We design a series of numerical experiments to analyze the integrability of a mass-imbalanced two-body system through energy level statistics and deep learning of wavefunctions. The level spacing distributions are fitted by a Brody distribution and the fitting parameter $ω$ is found to se…
▽ More
The study of integrable systems has led to significant advancements in our understanding of many-body physics. We design a series of numerical experiments to analyze the integrability of a mass-imbalanced two-body system through energy level statistics and deep learning of wavefunctions. The level spacing distributions are fitted by a Brody distribution and the fitting parameter $ω$ is found to separate the integrable and non-integrable mass ratios by a critical line $ω=0$. The convolutional neural network built from the probability density images could identify the transition points between integrable and non-integrable systems with high accuracy, yet in a much shorter computation time. A brilliant example of the network's ability is to identify a new integrable mass ratio $1/3$ by learning from the known integrable case of equal mass, with a remarkable network confidence of $98.78\%$. The robustness of our neural networks is further enhanced by adversarial learning, where samples are generated by standard and quantum perturbations mixed in the probability density images and the wavefunctions, respectively.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
A StrongREJECT for Empty Jailbreaks
Authors:
Alexandra Souly,
Qingyuan Lu,
Dillon Bowen,
Tu Trinh,
Elvis Hsieh,
Sana Pandey,
Pieter Abbeel,
Justin Svegliato,
Scott Emmons,
Olivia Watkins,
Sam Toyer
Abstract:
The rise of large language models (LLMs) has drawn attention to the existence of "jailbreaks" that allow the models to be used maliciously. However, there is no standard benchmark for measuring the severity of a jailbreak, leaving authors of jailbreak papers to create their own. We show that these benchmarks often include vague or unanswerable questions and use grading criteria that are biased tow…
▽ More
The rise of large language models (LLMs) has drawn attention to the existence of "jailbreaks" that allow the models to be used maliciously. However, there is no standard benchmark for measuring the severity of a jailbreak, leaving authors of jailbreak papers to create their own. We show that these benchmarks often include vague or unanswerable questions and use grading criteria that are biased towards overestimating the misuse potential of low-quality model responses. Some jailbreak techniques make the problem worse by decreasing the quality of model responses even on benign questions: we show that several jailbreaking techniques substantially reduce the zero-shot performance of GPT-4 on MMLU. Jailbreaks can also make it harder to elicit harmful responses from an "uncensored" open-source model. We present a new benchmark, StrongREJECT, which better discriminates between effective and ineffective jailbreaks by using a higher-quality question set and a more accurate response grading algorithm. We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks. We release our code and data at https://github.com/alexandrasouly/strongreject.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Bidirectional Generative Pre-training for Improving Time Series Representation Learning
Authors:
Ziyang Song,
Qincheng Lu,
He Zhu,
Yue Li
Abstract:
Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previou…
▽ More
Learning time-series representations for discriminative tasks has been a long-standing challenge. Current pre-training methods are limited in either unidirectional next-token prediction or randomly masked token prediction. We propose a novel architecture called Bidirectional Timely Generative Pre-trained Transformer (BiTimelyGPT), which pre-trains on time-series data by both next-token and previous-token predictions in alternating transformer layers. This pre-training task preserves original distribution and data shapes of the time-series. Additionally, the full-rank forward and backward attention matrices exhibit more expressive representation capabilities. Using biosignal data, BiTimelyGPT demonstrates superior performance in predicting neurological functionality, disease diagnosis, and physiological signs. By visualizing the attention heatmap, we observe that the pre-trained BiTimelyGPT can identify discriminative segments from time-series sequences, even more so after fine-tuning on the task.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Authors:
Yutao Hu,
Tianbin Li,
Quanfeng Lu,
Wenqi Shao,
Junjun He,
Yu Qiao,
** Luo
Abstract:
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this pape…
▽ More
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in various multimodal tasks. However, their potential in the medical domain remains largely unexplored. A significant challenge arises from the scarcity of diverse medical images spanning various modalities and anatomical regions, which is essential in real-world medical applications. To solve this problem, in this paper, we introduce OmniMedVQA, a novel comprehensive medical Visual Question Answering (VQA) benchmark. This benchmark is collected from 73 different medical datasets, including 12 different modalities and covering more than 20 distinct anatomical regions. Importantly, all images in this benchmark are sourced from authentic medical scenarios, ensuring alignment with the requirements of the medical field and suitability for evaluating LVLMs. Through our extensive experiments, we have found that existing LVLMs struggle to address these medical VQA problems effectively. Moreover, what surprises us is that medical-specialized LVLMs even exhibit inferior performance to those general-domain models, calling for a more versatile and robust LVLM in the biomedical field. The evaluation results not only reveal the current limitations of LVLM in understanding real medical images but also highlight our dataset's significance. Our code with dataset are available at https://github.com/OpenGVLab/Multi-Modality-Arena.
△ Less
Submitted 21 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Syllable based DNN-HMM Cantonese Speech to Text System
Authors:
Timothy Wong,
Claire Li,
Sam Lam,
Billy Chiu,
Qin Lu,
Minglei Li,
Dan Xiong,
Roy Shing Yu,
Vincent T. Y. Ng
Abstract:
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventi…
▽ More
This paper reports our work on building up a Cantonese Speech-to-Text (STT) system with a syllable based acoustic model. This is a part of an effort in building a STT system to aid dyslexic students who have cognitive deficiency in writing skills but have no problem expressing their ideas through speech. For Cantonese speech recognition, the basic unit of acoustic models can either be the conventional Initial-Final (IF) syllables, or the Onset-Nucleus-Coda (ONC) syllables where finals are further split into nucleus and coda to reflect the intra-syllable variations in Cantonese. By using the Kaldi toolkit, our system is trained using the stochastic gradient descent optimization model with the aid of GPUs for the hybrid Deep Neural Network and Hidden Markov Model (DNN-HMM) with and without I-vector based speaker adaptive training technique. The input features of the same Gaussian Mixture Model with speaker adaptive training (GMM-SAT) to DNN are used in all cases. Experiments show that the ONC-based syllable acoustic modeling with I-vector based DNN-HMM achieves the best performance with the word error rate (WER) of 9.66% and the real time factor (RTF) of 1.38812.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
First measurement of the yield of $^8$He isotopes produced in liquid scintillator by cosmic-ray muons at Daya Bay
Authors:
Daya Bay Collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
Y. C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng,
X. Y. Ding
, et al. (177 additional authors not shown)
Abstract:
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546…
▽ More
Daya Bay presents the first measurement of cosmogenic $^8$He isotope production in liquid scintillator, using an innovative method for identifying cascade decays of $^8$He and its child isotope, $^8$Li. We also measure the production yield of $^9$Li isotopes using well-established methodology. The results, in units of 10$^{-8}μ^{-1}$g$^{-1}$cm$^{2}$, are 0.307$\pm$0.042, 0.341$\pm$0.040, and 0.546$\pm$0.076 for $^8$He, and 6.73$\pm$0.73, 6.75$\pm$0.70, and 13.74$\pm$0.82 for $^9$Li at average muon energies of 63.9~GeV, 64.7~GeV, and 143.0~GeV, respectively. The measured production rate of $^8$He isotopes is more than an order of magnitude lower than any other measurement of cosmogenic isotope production. It replaces the results of previous attempts to determine the ratio of $^8$He to $^9$Li production that yielded a wide range of limits from 0 to 30\%. The results provide future liquid-scintillator-based experiments with improved ability to predict cosmogenic backgrounds.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
AccessLens: Auto-detecting Inaccessibility of Everyday Objects
Authors:
Nahyun Kwon,
Qian Lu,
Muhammad Hasham Qazi,
Joanne Liu,
Changhoon Oh,
Shu Kong,
Jeeeun Kim
Abstract:
In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts. This oversight, from small cabinet knobs to identical wall switches that can pose different contextual challenges, highlights an imperative need for solutions. Leveraging low-cost 3D-printed augmentations such as knob magnifiers and tactile labels seems promising…
▽ More
In our increasingly diverse society, everyday physical interfaces often present barriers, impacting individuals across various contexts. This oversight, from small cabinet knobs to identical wall switches that can pose different contextual challenges, highlights an imperative need for solutions. Leveraging low-cost 3D-printed augmentations such as knob magnifiers and tactile labels seems promising, yet the process of discovering unrecognized barriers remains challenging because disability is context-dependent. We introduce AccessLens, an end-to-end system designed to identify inaccessible interfaces in daily objects, and recommend 3D-printable augmentations for accessibility enhancement. Our approach involves training a detector using the novel AccessDB dataset designed to automatically recognize 21 distinct Inaccessibility Classes (e.g., bar-small and round-rotate) within 6 common object categories (e.g., handle and knob). AccessMeta serves as a robust way to build a comprehensive dictionary linking these accessibility classes to open-source 3D augmentation designs. Experiments demonstrate our detector's performance in detecting inaccessible objects.
△ Less
Submitted 23 February, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Observation of periodic optical spectra and soliton molecules in a novel passively mode-locked fiber laser
Authors:
Xiang Zhang,
Haobin Zheng,
Kangrui Chang,
Yong Shen,
Yongzhuang Zhou,
Qiao Lu,
Hongxin Zou
Abstract:
Due to the necessity of making a series of random adjustments after mode-locking in most experiments for preparing soliton molecules, the repeatability of the preparations remains a challenge. Here, we introduce a novel all-polarization-maintaining erbium-doped fiber laser that utilizes a nonlinear amplifying loop mirror for mode-locking and features a linear shape. This laser can stably output so…
▽ More
Due to the necessity of making a series of random adjustments after mode-locking in most experiments for preparing soliton molecules, the repeatability of the preparations remains a challenge. Here, we introduce a novel all-polarization-maintaining erbium-doped fiber laser that utilizes a nonlinear amplifying loop mirror for mode-locking and features a linear shape. This laser can stably output soliton molecules without any additional adjustment once the mode-locking self-starts. Moreover, it can achieve the transition from soliton molecule state to soliton state, and then to multi-pulse state by reducing the pum** power. The unconventional method of generating multi-pulses, combined with a wide pum** power range of 200--640 mW for maintaining mode-locking, allowed us to observe periodic optical spectra with two complete cycles for the first time. Based on the experimental facts, we develop a multistability model to explain this phenomenon. With its ability to switch between three stable states, this flexible laser can serve as a versatile toolbox for studying soliton dynamics.
△ Less
Submitted 6 March, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.