-
Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce
Authors:
Zhe Lin,
Jiwei Tan,
Dan Ou,
Xi Chen,
Shaowei Yao,
Bo Zheng
Abstract:
Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these mo…
▽ More
Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Authors:
Jiefu Ou,
Arda Uzunoglu,
Benjamin Van Durme,
Daniel Khashabi
Abstract:
AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and…
▽ More
AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and what should they look like? We explore this via a thought experiment: assuming that wikiHow tutorials cover a wide variety of human-written tasks, what is the space of APIs needed to cover these instructions? We propose a framework to iteratively induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4 to generate Pythonic programs as agent policies and bootstrap a universe of APIs by 1) reusing a seed set of APIs; and then 2) fabricate new API calls when necessary. The focus of this thought experiment is on defining these APIs rather than their executability. We apply the proposed pipeline on instructions from wikiHow tutorials. On a small fraction (0.5%) of tutorials, we induce an action space of 300+ APIs necessary for capturing the rich variety of tasks in the physical world. A detailed automatic and human analysis of the induction output reveals that the proposed pipeline enables effective reuse and creation of APIs. Moreover, a manual review revealed that existing simulators support only a small subset of the induced APIs (9 of the top 50 frequent APIs), motivating the development of action-rich embodied environments.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Atomistic modeling of bulk and grain boundary diffusion in solid electrolyte \texorpdfstring{Li\textsubscript{6}PS\textsubscript{5}Cl}{Li6PS5Cl} using machine-learning interatomic potentials
Authors:
Yongliang Ou,
Yuji Ikeda,
Lena Scholz,
Sergiy Divinski,
Felix Fritzen,
Blazej Grabowski
Abstract:
Li\textsubscript{6}PS\textsubscript{5}Cl is a promising candidate for the solid electrolyte in all-solid-state Li-ion batteries. In applications, this material is in a polycrystalline state with grain boundaries (GBs) that can affect ionic conductivity. While atomistic modeling provides valuable information on the impact of GBs on Li diffusion, such studies face either high computational cost (\te…
▽ More
Li\textsubscript{6}PS\textsubscript{5}Cl is a promising candidate for the solid electrolyte in all-solid-state Li-ion batteries. In applications, this material is in a polycrystalline state with grain boundaries (GBs) that can affect ionic conductivity. While atomistic modeling provides valuable information on the impact of GBs on Li diffusion, such studies face either high computational cost (\textit{ab initio} methods) or accuracy limitations (classical potentials) as challenges. Here, we develop a quality-level-based active learning scheme for efficient and systematic development of \textit{ab initio}-based machine-learning interatomic potentials, specifically moment tensor potentials (MTPs), for large-scale, long-time, and high-accuracy simulations of complex atomic structures and diffusion mechanisms as encountered in solid electrolytes. Based on this scheme, we obtain MTPs for Li\textsubscript{6}PS\textsubscript{5}Cl and investigate two tilt GBs, $\Sigma3(1\bar{1}2)[110]$, $\Sigma3(\bar{1}11)[110]$, and one twist GB, $\Sigma5(001)[001]$. All three GBs exhibit low formation energies of less than \SI{20}{meV/\angstrom\textsuperscript{2}}, indicating their high stability in polycrystalline Li\textsubscript{6}PS\textsubscript{5}Cl. Using the MTPs, diffusion coefficients of the anion-ordered and anion-disordered bulk, as well as the three GBs, are obtained from molecular dynamics simulations of atomistic models. At \SI{300}{\kelvin}, the GB diffusion coefficients fall between the ones of the anion-ordered bulk structure (\SI{0.012e-7}{cm^2/s}, corresponding ionic conductivity about \SI{0.2}{mS/cm}) and the anion-disordered bulk structure (\SI{50}{\percent} Cl/S-anion disorder; \SI{2.203e-7}{cm^2/s}, about \SI{29.8}{mS/cm}) of Li\textsubscript{6}PS\textsubscript{5}Cl. Experimental data fall between the Arrhenius-extrapolated diffusion coefficients of the investigated atomic structures.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Low-Complexity SVM Signal Recovery in Bandwidth-Limited 100Gb/s PAM4 PON Upstream
Authors:
Liyan Wu,
Yanlu Huang,
Kai **,
Shangya Han,
Kun Xu,
Yanni Ou
Abstract:
We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE.
We proposed a low-complexity SVM-based signal recovery algorithm and evaluated it in 100G-PON with 25G-class devices. For the first time, it experimentally achieved 24 dB power budget @ FEC threshold 1E-3 over 40 km SMF, improving receiver sensitivity over 2 dB compared to FFE&DFE.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Commissioning results from the Robo-AO-2 facility for rapid visible and near-infrared AO imaging
Authors:
Christoph Baranec,
James Ou,
Reed Riddle,
Ruihan Zhang,
Luke Mckay,
Rachel Rampy,
Morgan Bonnet,
Iven Hamilton,
Greg Ching,
Jessica Young,
Maıssa Salama,
Paul Barnes,
Shane Jacobson,
Peter Onaka,
Mark Chun,
Zachary Werber,
Keith Powell,
Marcos A. van Dam,
Benjamin Shappee
Abstract:
We installed the next-generation automated laser adaptive optics system, Robo-AO-2, on the University of Hawaii 2.2-m telescope on Maunakea in 2023. We engineered Robo-AO-2 to deliver robotic, diffraction-limited observations at visible and near-infrared wavelengths in unprecedented numbers. This new instrument takes advantage of upgraded components, manufacturing techniques and control; and inclu…
▽ More
We installed the next-generation automated laser adaptive optics system, Robo-AO-2, on the University of Hawaii 2.2-m telescope on Maunakea in 2023. We engineered Robo-AO-2 to deliver robotic, diffraction-limited observations at visible and near-infrared wavelengths in unprecedented numbers. This new instrument takes advantage of upgraded components, manufacturing techniques and control; and includes a parallel reconfigurable natural guide star wavefront sensor with which to explore hybrid wavefront sensing techniques. We present the results of commissioning in 2023 and 2024.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Neural Network-Assisted End-to-End Design for Dispersive Full-Parameter Control of Meta-Optics
Authors:
Hanbin Chi,
Yueqiang Hu,
Xiangnian Ou,
Yuting Jiang,
Dian Yu,
Shaozhen Lou,
Quan Wang,
Qiong Xie,
Cheng-Wei Qiu,
Huigao Duan
Abstract:
Flexible control light field across multiple parameters is the cornerstone of versatile and miniaturized optical devices. Metasurfaces, comprising subwavelength scatterers, offer a potent platform for executing such precise manipulations. However, the inherent mutual constraints between parameters of metasurfaces make it challenging for traditional approaches to achieve full-parameter control acro…
▽ More
Flexible control light field across multiple parameters is the cornerstone of versatile and miniaturized optical devices. Metasurfaces, comprising subwavelength scatterers, offer a potent platform for executing such precise manipulations. However, the inherent mutual constraints between parameters of metasurfaces make it challenging for traditional approaches to achieve full-parameter control across multiple wavelengths. Here, we propose a universal end-to-end inverse design framework to directly optimize the geometric parameter layout of meta-optics based on the target functionality of full-parameter control across multiple wavelengths. This framework employs a differentiable forward simulator integrating a neural network-based dispersive full-parameter Jones matrix and Fourier propagation to facilitate gradient-based optimization. Its superiority over sequential forward designs in dual-polarization channel color holography with higher quality and tri-polarization three-dimensional color holography with higher multiplexed capacity is showcased. To highlight the universality, we further present polarized spectral multi-information processing with six arbitrary polarizations and three wavelengths. This versatile, differentiable, system-level design framework is poised to expedite the advancement of meta-optics in integrated multi-information display, imaging, and communication, extending to multi-modal sensing applications.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
LUT-boosted CDR and Equalization for Burst-mode 50/100 Gbit/s Bandwidth-limited Flexible PON
Authors:
Yanlu Huang,
Liyan Wu,
Shangya Han,
Kai **,
Kun Xu,
Yanni Ou
Abstract:
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
We proposed and experimentally demonstrated a look-up table boosted fast CDR and equalization scheme for the burst-mode 50/100 Gbps bandwidth-limited flexible PON, requiring no preamble for convergence and achieved the same bit error rate performance as in the case of long preambles.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Integrated Triply Resonant Electro-Optic Frequency Comb in Lithium Tantalate
Authors:
Junyin Zhang,
Chengli Wang,
Connor Denney,
Grigory Lihachev,
Jianqi Hu,
Wil Kao,
Terence Blésin,
Nikolai Kuznetsov,
Zihan Li,
Mikhail Churaev,
Xin Ou,
Johann Riemensberger,
Gabriel Santamaria-Botello,
Tobias J. Kippenberg
Abstract:
Integrated frequency comb generators based on Kerr parametric oscillation have led to chip-scale, gigahertz-spaced combs with new applications spanning hyperscale telecommunications, low-noise microwave synthesis, LiDAR, and astrophysical spectrometer calibration. Recent progress in lithium niobate (LN) photonic integrated circuits (PICs) has resulted in chip-scale electro-optic (EO) frequency com…
▽ More
Integrated frequency comb generators based on Kerr parametric oscillation have led to chip-scale, gigahertz-spaced combs with new applications spanning hyperscale telecommunications, low-noise microwave synthesis, LiDAR, and astrophysical spectrometer calibration. Recent progress in lithium niobate (LN) photonic integrated circuits (PICs) has resulted in chip-scale electro-optic (EO) frequency combs, offering precise comb-line positioning and simple operation without relying on the formation of dissipative Kerr solitons. However, current integrated EO combs face limited spectral coverage due to the large microwave power required to drive the non-resonant capacitive electrodes and the strong intrinsic birefringence of Lithium Niobate. Here, we overcome both challenges with an integrated triply resonant architecture, combining monolithic microwave integrated circuits (MMICs) with PICs based on the recently emerged thin-film lithium tantalate. With resonantly enhanced EO interaction and reduced birefringence in Lithium Tantalate, we achieve a four-fold comb span extension and a 16-fold power reduction compared to the conventional non-resonant microwave design. Driven by a hybrid-integrated laser diode, the comb spans over 450nm (60THz) with >2000 lines, and the generator fits within a compact 1cm^2 footprint. We additionally observe that the strong EO coupling leads to an increased comb existence range approaching the full free spectral range of the optical microresonator. The ultra-broadband comb generator, combined with detuning-agnostic operation, could advance chip-scale spectrometry and ultra-low-noise millimeter wave synthesis and unlock octave-spanning EO combs. The methodology of co-designing microwave and optical resonators can be extended to a wide range of integrated electro-optics applications.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Symbolic Learning Enables Self-Evolving Agents
Authors:
Wangchunshu Zhou,
Yixin Ou,
Shengwei Ding,
Long Li,
Jialong Wu,
Tiannan Wang,
Jiamin Chen,
Shuai Wang,
Xiaohua Xu,
Ningyu Zhang,
Huajun Chen,
Yuchen Eleanor Jiang
Abstract:
The AI community has been exploring a pathway to artificial general intelligence (AGI) by develo** "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that the…
▽ More
The AI community has been exploring a pathway to artificial general intelligence (AGI) by develo** "language agents", which are complex large language models (LLMs) pipelines involving both prompting techniques and tool usage methods. While language agents have demonstrated impressive capabilities for many real-world tasks, a fundamental limitation of current language agents research is that they are model-centric, or engineering-centric. That's to say, the progress on prompts, tools, and pipelines of language agents requires substantial manual engineering efforts from human experts rather than automatically learning from data. We believe the transition from model-centric, or engineering-centric, to data-centric, i.e., the ability of language agents to autonomously learn and evolve in environments, is the key for them to possibly achieve AGI.
In this work, we introduce agent symbolic learning, a systematic framework that enables language agents to optimize themselves on their own in a data-centric way using symbolic optimizers. Specifically, we consider agents as symbolic networks where learnable weights are defined by prompts, tools, and the way they are stacked together. Agent symbolic learning is designed to optimize the symbolic network within language agents by mimicking two fundamental algorithms in connectionist learning: back-propagation and gradient descent. Instead of dealing with numeric weights, agent symbolic learning works with natural language simulacrums of weights, loss, and gradients. We conduct proof-of-concept experiments on both standard benchmarks and complex real-world tasks and show that agent symbolic learning enables language agents to update themselves after being created and deployed in the wild, resulting in "self-evolving agents".
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Dependence on parameters of solutions for a generalized poly-Laplacian system on weighted graphs
Authors:
Xiaoyu Wang,
Xingyong Zhang,
Xin Ou
Abstract:
We mainly investigate the continuous dependence on parameters of nontrivial solutions for a generalized poly-Laplacian system on the weighted finite graph $G=(V, E)$. We firstly present an existence result of mountain pass type nontrivial solutions when the nonlinear term $F$ satisfies the super-$(p, q)$ linear growth condition which is a simple generalization of those results in [28]. Then we mai…
▽ More
We mainly investigate the continuous dependence on parameters of nontrivial solutions for a generalized poly-Laplacian system on the weighted finite graph $G=(V, E)$. We firstly present an existence result of mountain pass type nontrivial solutions when the nonlinear term $F$ satisfies the super-$(p, q)$ linear growth condition which is a simple generalization of those results in [28]. Then we mainly show that the mountain pass type nontrivial solutions of the poly-Laplacian system are uniformly bounded for parameters and the concrete upper and lower bounds are given, and are continuously dependent on parameters. Similarly, we also present the existence result, the concrete upper and lower bounds, uniqueness, and dependence on parameters for the locally minimum type nontrivial solutions. Subsequently, we present an example on optimal control as an application of our results. Finally, we give a nonexistence result and some results for the corresponding scalar equation.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Flat bands and distinct density wave orders in correlated Kagome superconductor CsCr$_3$Sb$_5$
Authors:
Shuting Peng,
Yulei Han,
Yongkai Li,
Jianchang Shen,
Yu Miao,
Yang Luo,
Linwei Huai,
Zhipeng Ou,
Hongyu Li,
Ziji Xiang,
Zhengtai Liu,
Dawei Shen,
Makoto Hashimoto,
Donghui Lu,
Yugui Yao,
Zhenhua Qiao,
Zhiwei Wang,
Junfeng He
Abstract:
Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discover…
▽ More
Kagome metal CsV$_3$Sb$_5$ has attracted much recent attention due to the coexistence of multiple exotic orders and the associated proposals to mimic unconventional high temperature superconductors. Nevertheless, magnetism and strong electronic correlations -- two essential ingredients for unconventional superconductivity, are absent in this V-based Kagome metal. CsCr$_3$Sb$_5$ is a newly discovered Cr-based parallel of CsV$_3$Sb$_5$, in which magnetism appears with charge density wave and superconductivity at different temperature and pressure regions. Enhanced electronic correlations are also suggested by theoretical proposals due to the calculated flat bands. Here, we report angle-resolved photoemission measurements and first-principles calculations on this new material system. Electron energy bands and the associated orbitals are resolved. Flat bands are observed near the Fermi level. Do** dependent measurements on Cs(Cr$_x$V$_{1-x}$)$_3$Sb$_5$ reveal a gradually enhanced band renormalization from CsV$_3$Sb$_5$ to CsCr$_3$Sb$_5$, accompanied by distinct spatial symmetry breaking states in the phase diagram.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
Authors:
Zexuan Qiu,
Zi**g Ou,
Bin Wu,
**g**g Li,
Aiwei Liu,
Irwin King
Abstract:
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, trainin…
▽ More
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. Our approach utilizes entropy-based document-parallel ensemble decoding to prioritize low-entropy distributions from retrieved documents, thereby enhancing the extraction of relevant information of context. Additionally, it incorporates a contrastive decoding mechanism that contrasts the obtained low-entropy ensemble distribution with the high-entropy distribution derived from the model's internal knowledge across layers, which ensures a greater emphasis on reliable external information. Extensive experiments on open-domain question answering datasets demonstrate the superiority of our method.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other
Authors:
Yifei Gao,
Jie Ou,
Lei Wang,
Yuting Xiao,
Zhiyuan Xiang,
Ruiting Dai,
Jun Cheng
Abstract:
Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization metho…
▽ More
Emergent Large Language Models (LLMs) use their extraordinary performance and powerful deduction capacity to discern from traditional language models. However, the expenses of computational resources and storage for these LLMs are stunning, quantization then arises as a trending conversation. To address accuracy decay caused by quantization, two streams of works in post-training quantization methods stand out. One uses other weights to compensate existing quantization error, while the other transfers the quantization difficulty to other parts in the model. Combining both merits, we introduce Learnable Singular value Increment (LSI) as an advanced solution. LSI uses Singular Value Decomposition to extract singular values of the weights and make them learnable to help weights compensate each other conditioned on activation. Incorporating LSI with existing techniques, we achieve state-of-the-art performance in diverse quantization settings, no matter in weight-only, weight-activation or extremely low bit scenarios. By unleashing the potential of LSI, efficient finetuning on quantized model is no longer a prohibitive problem.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis
Authors:
Lin Fan,
Xun Gong,
Cenyang Zheng,
Yafei Ou
Abstract:
The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answ…
▽ More
The intersection of medical Visual Question Answering (Med-VQA) is a challenging research topic with advantages including patient engagement and clinical expert involvement for second opinions. However, existing Med-VQA methods based on joint embedding fail to explain whether their provided results are based on correct reasoning or coincidental answers, which undermines the credibility of VQA answers. In this paper, we investigate the construction of a more cohesive and stable Med-VQA structure. Motivated by causal effect, we propose a novel Triangular Reasoning VQA (Tri-VQA) framework, which constructs reverse causal questions from the perspective of "Why this answer?" to elucidate the source of the answer and stimulate more reasonable forward reasoning processes. We evaluate our method on the Endoscopic Ultrasound (EUS) multi-attribute annotated dataset from five centers, and test it on medical VQA datasets. Experimental results demonstrate the superiority of our approach over existing methods. Our codes and pre-trained models are available at https://anonymous.4open.science/r/Tri_VQA.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Diffusion Model With Optimal Covariance Matching
Authors:
Zi**g Ou,
Mingtian Zhang,
Andi Zhang,
Tim Z. Xiao,
Yingzhen Li,
David Barber
Abstract:
The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covarianc…
▽ More
The probabilistic diffusion model has become highly effective across various domains. Typically, sampling from a diffusion model involves using a denoising distribution characterized by a Gaussian with a learned mean and either fixed or learned covariances. In this paper, we leverage the recently proposed full covariance moment matching technique and introduce a novel method for learning covariances. Unlike traditional data-driven covariance approximation approaches, our method involves directly regressing the optimal analytic covariance using a new, unbiased objective named Optimal Covariance Matching (OCM). This approach can significantly reduce the approximation error in covariance prediction. We demonstrate how our method can substantially enhance the sampling efficiency of both Markovian (DDPM) and non-Markovian (DDIM) diffusion model families.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
MDA: An Interpretable Multi-Modal Fusion with Missing Modalities and Intrinsic Noise
Authors:
Lin Fan,
Yafei Ou,
Cenyang Zheng,
Pengyu Dai,
Tamotsu Kamishima,
Masayuki Ikebe,
Kenji Suzuki,
Xun Gong
Abstract:
Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researcher…
▽ More
Multi-modal fusion is crucial in medical data research, enabling a comprehensive understanding of diseases and improving diagnostic performance by combining diverse modalities. However, multi-modal fusion faces challenges, including capturing interactions between modalities, addressing missing modalities, handling erroneous modal information, and ensuring interpretability. Many existing researchers tend to design different solutions for these problems, often overlooking the commonalities among them. This paper proposes a novel multi-modal fusion framework that achieves adaptive adjustment over the weights of each modality by introducing the Modal-Domain Attention (MDA). It aims to facilitate the fusion of multi-modal information while allowing for the inclusion of missing modalities or intrinsic noise, thereby enhancing the representation of multi-modal data. We provide visualizations of accuracy changes and MDA weights by observing the process of modal fusion, offering a comprehensive analysis of its interpretability. Extensive experiments on various gastrointestinal disease benchmarks, the proposed MDA maintains high accuracy even in the presence of missing modalities and intrinsic noise. One thing worth mentioning is that the visualization of MDA is highly consistent with the conclusions of existing clinical studies on the dependence of different diseases on various modalities. Code and dataset will be made available.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
A Zeroth-Order Proximal Algorithm for Consensus Optimization
Authors:
Chengan Wang,
Zichong Ou,
Jie Lu
Abstract:
This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating He…
▽ More
This paper considers a consensus optimization problem, where all the nodes in a network, with access to the zeroth-order information of its local objective function only, attempt to cooperatively achieve a common minimizer of the sum of their local objectives. To address this problem, we develop ZoPro, a zeroth-order proximal algorithm, which incorporates a zeroth-order oracle for approximating Hessian and gradient into a recently proposed, high-performance distributed second-order proximal algorithm. We show that the proposed ZoPro algorithm, equipped with a dynamic stepsize, converges linearly to a neighborhood of the optimum in expectation, provided that each local objective function is strongly convex and smooth. Extensive simulations demonstrate that ZoPro converges faster than several state-of-the-art distributed zeroth-order algorithms and outperforms a few distributed second-order algorithms in terms of running time for reaching given accuracy.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Discovery and Extensive Follow-Up of SN 2024ggi, a nearby type IIP supernova in NGC 3621
Authors:
Ting-Wan Chen,
Sheng Yang,
Shubham Srivastav,
Takashi J. Moriya,
Stephen J. Smartt,
Sofia Rest,
Armin Rest,
Hsing Wen Lin,
Hao-Yu Miao,
Yu-Chi Cheng,
Amar Aryan,
Chia-Yu Cheng,
Morgan Fraser,
Li-Ching Huang,
Meng-Han Lee,
Cheng-Han Lai,
Yu Hsuan Liu,
Aiswarya Sankar. K,
Ken W. Smith,
Heloise F. Stevance,
Ze-Ning Wang,
Joseph P. Anderson,
Charlotte R. Angus,
Thomas de Boer,
Kenneth Chambers
, et al. (23 additional authors not shown)
Abstract:
We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o…
▽ More
We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o- and r-band light curves show a rapid rise of 3.3 magnitudes in 13.7 hours, much faster than SN 2023ixf (another recent, nearby, and well-observed SN II). Between 13.8 and 18.8 hours after explosion SN 2024ggi became bluer, with u-g colour drop** from 0.53 to 0.15 mag. The rapid blueward evolution indicates a wind shock breakout (SBO) scenario. No hour-long brightening expected for the SBO from a bare stellar surface was detected during our observations. The classification spectrum, taken 17 hours after the SN explosion, shows flash features of high-ionization species such as Balmer lines, He I, C III, and N III. Detailed light curve modeling reveals critical insights into the properties of the circumstellar material (CSM). Our favoured model has an explosion energy of 2 x 10^51 erg, a mass-loss rate of 10^-3 solar_mass/yr (with an assumed 10 km/s wind), and a confined CSM radius of 6 x 10^14 cm. The corresponding CSM mass is 0.4 solar_mass. Comparisons with SN 2023ixf highlight that SN 2024ggi has a smaller CSM density, resulting in a faster rise and fainter UV flux. The extensive dataset and the involvement of citizen astronomers underscore that a collaborative network is essential for SBO searches, leading to more precise and comprehensive SN characterizations.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Dense Outflowing Molecular Gas in Massive Star-forming Regions
Authors:
Yani Xu,
Junzhi Wang,
Shu Liu,
Juan Li,
Yuqiang LI,
Rui Luo,
Chao Ou,
Siqi Zheng,
Yijia Liu
Abstract:
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample o…
▽ More
Dense outflowing gas, traced by transitions of molecules with large dipole moment, is important for understanding mass loss and feedback of massive star formation. HCN 3-2 and HCO$^+$ 3-2 are good tracers of dense outflowing molecular gas, which are closely related to active star formation. In this study, we present on-the-fly (OTF) map** observations of HCN 3-2 and HCO$^+$ 3-2 toward a sample of 33 massive star-forming regions using the 10-m Submillimeter Telescope (SMT). With the spatial distribution of line wings of HCO$^+$ 3-2 and HCN 3-2, outflows are detected in 25 sources, resulting in a detection rate of 76$\%$. The optically thin H$^{13}$CN and H$^{13}$CO$^+$ 3-2 lines are used to identify line wings as outflows and estimate core mass. The mass $M_{out}$, momentum $P_{out}$, kinetic energy $E_{K}$, force $F_{out}$ and mass loss rate $\dot M_{out}$ of outflow and core mass, are obtained for each source. A sublinear tight correlation is found between the mass of dense molecular outflow and core mass, with an index of $\sim$ 0.8 and a correlation coefficient of 0.88.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Growth and characterization of the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films: dominant contribution of the $d_{x^{2}-y^{2}}$ orbital at ambient pressure
Authors:
Yuecong Liu,
Mengjun Ou,
Haifeng Chu,
Huan Yang,
Qing Li,
Yingjie Zhang,
Hai-Hu Wen
Abstract:
By using the pulsed-laser-ablation technique, we have successfully grown the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films with $c$-axis orientation perpendicular to the film surface. X-ray diffraction shows that the (00l) peaks can be well indexed to the La$_{3}$Ni$_{2}$O$_{7-δ}$ phase. Resistive measurements show that the samples can be tuned from weak insulating to metallic behavior through adjusting th…
▽ More
By using the pulsed-laser-ablation technique, we have successfully grown the La$_{3}$Ni$_{2}$O$_{7-δ}$ thin films with $c$-axis orientation perpendicular to the film surface. X-ray diffraction shows that the (00l) peaks can be well indexed to the La$_{3}$Ni$_{2}$O$_{7-δ}$ phase. Resistive measurements show that the samples can be tuned from weak insulating to metallic behavior through adjusting the growth conditions. Surprisingly, all curves of $ρ-T$ in the temperature region of 2$\sim$300~K do not show the anomalies corresponding to either the spin density wave or the charge density wave orders as seen in bulk samples. Hall effect measurements show a linear field dependence with the dominant hole charge carriers, but the Hall coefficient $R_{H}=ρ_{xy}/H$ exhibits strong temperature dependence. The magnetoresistance above about 50~K is positive but very weak, indicating the absence of multiband effect. However, a negative magnetoresistance is observed at low temperatures, which shows the delocalization effect. Detailed analysis on the magnetoresistance suggests that the delocalization effect at low temperatures is due to the Kondo-like effect, rather than the Anderson weak localization. Our transport results suggest that, the electronic conduction is fulfilled by the $d_{x^{2}-y^{2}}$ orbital with holes as the dominant charge carriers, while the interaction through Hund's coupling with the localized $d_{z^{2}}$ orbital plays an important role in the charge dynamics.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Quantum Reductive Perturbation Method for Photon Propagations in a Cold Atomic Gas
Authors:
Ou Yao,
Huang Guoxiang
Abstract:
We develop a quantum reductive perturbation method (RPM), a generalization of classical RPM widely used in nonlinear wave theory, to derive a simplified model (i.e. quantum nonlinear Schrodinger equation) from fully quantum Heisenberg-Langevin-Maxwell equations describingphoton propagations in a coherent cold atomic gas. The result is used to discuss two-photon bound states and optical solitons in…
▽ More
We develop a quantum reductive perturbation method (RPM), a generalization of classical RPM widely used in nonlinear wave theory, to derive a simplified model (i.e. quantum nonlinear Schrodinger equation) from fully quantum Heisenberg-Langevin-Maxwell equations describingphoton propagations in a coherent cold atomic gas. The result is used to discuss two-photon bound states and optical solitons in the gas. Though a specific system is considered, the quantum RPM established here is very general and can be applied to other complex quantum nonlinear problems.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
DR-RAG: Applying Dynamic Document Relevance to Retrieval-Augmented Generation for Question-Answering
Authors:
Zijian Hei,
Weiling Liu,
Wenjie Ou,
Juyi Qiao,
Junming Jiao,
Guowen Song,
Ting Tian,
Yi Lin
Abstract:
Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the rele…
▽ More
Retrieval-Augmented Generation (RAG) has recently demonstrated the performance of Large Language Models (LLMs) in the knowledge-intensive tasks such as Question-Answering (QA). RAG expands the query context by incorporating external knowledge bases to enhance the response accuracy. However, it would be inefficient to access LLMs multiple times for each query and unreliable to retrieve all the relevant documents by a single query. We have found that even though there is low relevance between some critical documents and query, it is possible to retrieve the remaining documents by combining parts of the documents with the query. To mine the relevance, a two-stage retrieval framework called Dynamic-Relevant Retrieval-Augmented Generation (DR-RAG) is proposed to improve document retrieval recall and the accuracy of answers while maintaining efficiency. Additionally, a compact classifier is applied to two different selection strategies to determine the contribution of the retrieved documents to answering the query and retrieve the relatively relevant documents. Meanwhile, DR-RAG call the LLMs only once, which significantly improves the efficiency of the experiment. The experimental results on multi-hop QA datasets show that DR-RAG can significantly improve the accuracy of the answers and achieve new progress in QA systems.
△ Less
Submitted 16 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
TIM: Temporal Interaction Model in Notification System
Authors:
Huxiao Ji,
Haitao Yang,
Linchuan Li,
Shunyu Zhang,
Cunyi Zhang,
Xuan** Li,
Wenwu Ou
Abstract:
Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patter…
▽ More
Modern mobile applications heavily rely on the notification system to acquire daily active users and enhance user engagement. Being able to proactively reach users, the system has to decide when to send notifications to users. Although many researchers have studied optimizing the timing of sending notifications, they only utilized users' contextual features, without modeling users' behavior patterns. Additionally, these efforts only focus on individual notifications, and there is a lack of studies on optimizing the holistic timing of multiple notifications within a period. To bridge these gaps, we propose the Temporal Interaction Model (TIM), which models users' behavior patterns by estimating CTR in every time slot over a day in our short video application Kuaishou. TIM leverages long-term user historical interaction sequence features such as notification receipts, clicks, watch time and effective views, and employs a temporal attention unit (TAU) to extract user behavior patterns. Moreover, we provide an elegant strategy of holistic notifications send time control to improve user engagement while minimizing disruption. We evaluate the effectiveness of TIM through offline experiments and online A/B tests. The results indicate that TIM is a reliable tool for forecasting user behavior, leading to a remarkable enhancement in user engagement without causing undue disturbance.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Authors:
**gyang Ou,
Shen Nie,
Kaiwen Xue,
Fengqi Zhu,
Jiacheng Sun,
Zhenguo Li,
Chongxuan Li
Abstract:
Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time…
▽ More
Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time-dependent scalar in an analytic form. Motivated by this finding, we propose reparameterized absorbing discrete diffusion (RADD), a dedicated diffusion model without time-condition that characterizes the time-independent conditional probabilities. Besides its simplicity, RADD can reduce the number of function evaluations (NFEs) by caching the output of the time-independent network when the noisy sample remains unchanged in a sampling interval. Empirically, RADD is up to 3.5 times faster while achieving similar performance with the strongest baseline. Built upon the new perspective of conditional distributions, we further unify absorbing discrete diffusion and any-order autoregressive models (AO-ARMs), showing that the upper bound on the negative log-likelihood for the diffusion model can be interpreted as an expected negative log-likelihood for AO-ARMs. Further, our RADD models achieve SOTA performance among diffusion models on 5 zero-shot language modeling benchmarks (measured by perplexity) at the GPT-2 scale. Our code is available at https://github.com/ML-GSAI/RADD.
△ Less
Submitted 6 July, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
A Hybrid Deep Learning Classification of Perimetric Glaucoma Using Peripapillary Nerve Fiber Layer Reflectance and Other OCT Parameters from Three Anatomy Regions
Authors:
Ou Tan,
David S. Greenfield,
Brian A. Francis,
Rohit Varma,
Joel S. Schuman,
David Huang,
Dongseok Choi
Abstract:
Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma…
▽ More
Precis: A hybrid deep-learning model combines NFL reflectance and other OCT parameters to improve glaucoma diagnosis. Objective: To investigate if a deep learning model could be used to combine nerve fiber layer (NFL) reflectance and other OCT parameters for glaucoma diagnosis. Patients and Methods: This is a prospective observational study where of 106 normal subjects and 164 perimetric glaucoma (PG) patients. Peripapillary NFL reflectance map, NFL thickness map, optic head analysis of disc, and macular ganglion cell complex thickness were obtained using spectral domain OCT. A hybrid deep learning model combined a fully connected network (FCN) and a convolution neural network (CNN) to develop and combine those OCT maps and parameters to distinguish normal and PG eyes. Two deep learning models were compared based on whether the NFL reflectance map was used as part of the input or not. Results: The hybrid deep learning model with reflectance achieved 0.909 sensitivity at 99% specificity and 0.926 at 95%. The overall accuracy was 0.948 with 0.893 sensitivity and 1.000 specificity, and the AROC was 0.979, which is significantly better than the logistic regression models (p < 0.001). The second best model is the hybrid deep learning model w/o reflectance, which also had significantly higher AROC than logistic regression models (p < 0.001). Logistic regression with reflectance model had slightly higher AROC or sensitivity than the other logistic regression model without reflectance (p = 0.024). Conclusions: Hybrid deep learning model significantly improved the diagnostic accuracy, without or without NFL reflectance. Hybrid deep learning model, combining reflectance/NFL thickness/GCC thickness/ONH parameter, may be a practical model for glaucoma screen purposes.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Is Data Valuation Learnable and Interpretable?
Authors:
Ou Wu,
Weiyao Zhu,
Mengyang Li
Abstract:
Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in develo** data valuation methods. The primary data valuation methodology is based on the Shapley value from game theory, and various methods are proposed along this path. {Even though Shapley value-based valuation has…
▽ More
Measuring the value of individual samples is critical for many data-driven tasks, e.g., the training of a deep learning model. Recent literature witnesses the substantial efforts in develo** data valuation methods. The primary data valuation methodology is based on the Shapley value from game theory, and various methods are proposed along this path. {Even though Shapley value-based valuation has solid theoretical basis, it is entirely an experiment-based approach and no valuation model has been constructed so far.} In addition, current data valuation methods ignore the interpretability of the output values, despite an interptable data valuation method is of great helpful for applications such as data pricing. This study aims to answer an important question: is data valuation learnable and interpretable? A learned valuation model have several desirable merits such as fixed number of parameters and knowledge reusability. An intrepretable data valuation model can explain why a sample is valuable or invaluable. To this end, two new data value modeling frameworks are proposed, in which a multi-layer perception~(MLP) and a new regression tree are utilized as specific base models for model training and interpretability, respectively. Extensive experiments are conducted on benchmark datasets. {The experimental results provide a positive answer for the question.} Our study opens up a new technical path for the assessing of data values. Large data valuation models can be built across many different data-driven tasks, which can promote the widespread application of data valuation.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
A general design method for ultra-long optical path length multipass matrix cells
Authors:
Yiyun Gai,
Wen** Li,
Kaihao Yi,
Xue Ou,
Peng Liu,
Xin Zhou
Abstract:
For the first time, we propose a general design method for ultra-long optical path length (OPL) multipass matrix cells (MMCs) based on multi-cycle mode of two-sided field mirrors. The design idea of the dual circulation mode with two-sided field mirrors is elaborated in detail with the example of MMC based on dual Pickett Bradley White cell (PBWC), and the simple design methods of the other three…
▽ More
For the first time, we propose a general design method for ultra-long optical path length (OPL) multipass matrix cells (MMCs) based on multi-cycle mode of two-sided field mirrors. The design idea of the dual circulation mode with two-sided field mirrors is elaborated in detail with the example of MMC based on dual Pickett Bradley White cell (PBWC), and the simple design methods of the other three MMCs based on the dual circulation mode of PBWC and Bernstein Herzberg White cell (BHWC) are given. Further, we propose a general design method for ultra-long OPL MMCs with multi-cycle mode by adding cyclic elements. The OPL of the MMCs designed by this method can reach the order of kilometers or even tens of kilometers. The novel MMCs have the advantages of simple structure, strong spot formation regularity, easy expansion, high mirror utilization ratio, high reuse times of spot spatial position, good stability and extremely high ratio of the optical path length to the volume (RLV). In order to evaluate the performance of the new MMCs, an open-path methane gas sensor with the MMC based on triple PBWC was constructed, which was used to continuously measure the methane in the laboratory, and the feasibility, effectiveness and practicability of the new design method were verified. The design method proposed in this paper provides a new idea for the design of multipass cell (MPC), and the new MMCs designed have great potential application value in the field of high-precision trace gas monitoring.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Authors:
Saierdaer Yusuyin,
Te Ma,
Hao Huang,
Wenbo Zhao,
Zhijian Ou
Abstract:
There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. Th…
▽ More
There exist three approaches for multilingual and crosslingual automatic speech recognition (MCL-ASR) - supervised pre-training with phonetic or graphemic transcription, and self-supervised pre-training. We find that pre-training with phonetic supervision has been underappreciated so far for MCL-ASR, while conceptually it is more advantageous for information sharing between different languages. This paper explores the approach of pre-training with weakly phonetic supervision towards data-efficient MCL-ASR, which is called Whistle. We relax the requirement of gold-standard human-validated phonetic transcripts, and obtain International Phonetic Alphabet (IPA) based transcription by leveraging the LanguageNet grapheme-to-phoneme (G2P) models. We construct a common experimental setup based on the CommonVoice dataset, called CV-Lang10, with 10 seen languages and 2 unseen languages. A set of experiments are conducted on CV-Lang10 to compare, as fair as possible, the three approaches under the common setup for MCL-ASR. Experiments demonstrate the advantages of phoneme-based models (Whistle) for MCL-ASR, in terms of speech recognition for seen languages, crosslingual performance for unseen languages with different amounts of few-shot data, overcoming catastrophic forgetting, and training efficiency.It is found that when training data is more limited, phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. To support reproducibility and promote future research along this direction, we will release the code, models and data for the whole pipeline of Whistle at https://github.com/thu-spmi/CAT upon publication.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
CP asymmetry from the effect of the isospin symmetry breaking during B-meson decay
Authors:
Wan-Ying Yao,
Gang Lü,
Hai-Feng Ou
Abstract:
The direct CP asymmetry in quasi-two-body decays of $B \rightarrow (V\rightarrow π^{+}π^{-})P $ is investigated in the perturbative QCD (PQCD) method, where P represents a pseudoscalar meson and V refers to $ρ$, $ω$ and $φ$ mesons. We present the amplitude of the quasi-two-body decay process and investigate the effects of mixed resonance involving $ρ^{0}-ω$, $ρ^{0}-φ$ and $ω-φ$, while considering…
▽ More
The direct CP asymmetry in quasi-two-body decays of $B \rightarrow (V\rightarrow π^{+}π^{-})P $ is investigated in the perturbative QCD (PQCD) method, where P represents a pseudoscalar meson and V refers to $ρ$, $ω$ and $φ$ mesons. We present the amplitude of the quasi-two-body decay process and investigate the effects of mixed resonance involving $ρ^{0}-ω$, $ρ^{0}-φ$ and $ω-φ$, while considering the impact of isospin symmetry breaking. We observe a significant CP asymmetry when the invariant mass of the $π^{+}π^{-}$ pair is within the resonance ranges of $ρ$, $ω$ and $φ$. Consequently, we proceed to quantify the regional CP asymmetry in these resonance regions. A significant difference is observed when comparing results obtained with and without interference and isospin conservation. The CP asymmetry results obtained from the three-body decay process, without interference due to isospin conservation by the PQCD method, are in agreement with the newly updated data acquired by the LHCb experiment.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Focal Loss Analysis of Peripapillary Nerve Fiber Layer Reflectance for Glaucoma Diagnosis
Authors:
Ou Tan,
Dongseok Choi,
Aiyin Chen,
David S. Greenfield,
Brian A. Francis,
Rohit Varma,
Joel S. Schuman,
David Huang,
Advanced Imaging for Glaucoma Study Group
Abstract:
Purpose: To evaluate nerve fiber layer (NFL) reflectance for glaucoma diagnosis using a large dataset. Methods: Participants were imaged with 4.9mm ONH scans using spectral-domain optical coherence tomography (OCT). The NFL reflectance map was reconstructed from 13 concentric rings of optic nerve head(ONH) scan, then processed by an azimuthal filter to reduce directional reflectance bias due to va…
▽ More
Purpose: To evaluate nerve fiber layer (NFL) reflectance for glaucoma diagnosis using a large dataset. Methods: Participants were imaged with 4.9mm ONH scans using spectral-domain optical coherence tomography (OCT). The NFL reflectance map was reconstructed from 13 concentric rings of optic nerve head(ONH) scan, then processed by an azimuthal filter to reduce directional reflectance bias due to variation of beam incidence angle. The peripapillary thickness and reflectance maps were both divided into 96 superpixels. Low-reflectance and low-thickness superpixels were defined as values below the 5th percentile normative reference for that location. Focal reflectance loss was measured by summing loss, relative to the normal reference average, in low-reflectance superpixels. Focal thickness loss was calculated in a similar fashion. The area under receiving characteristic curve (AROC) was used to assess diagnostic accuracy. Results: Fifty-three normal, 196 pre-perimetric, 132 early perimetric, and 59 moderate and advanced perimetric glaucoma participants were included from the Advanced Imaging for Glaucoma Study. Sixty-seven percent of glaucomatous reflectance maps showed characteristic contiguous wedge or diffuse defects. Focal NFL reflectance loss had significantly higher diagnostic accuracy than the best NFL thickness parameters (both map-based and profile-based): AROC 0.80 v. 0.75 (p<0.004) for distinguishing glaucoma eyes from healthy control eyes. The diagnostic sensitivity was also significantly higher at both 99% and 95% specificity operating points. Conclusions: Focal NFL reflectance loss improved glaucoma diagnostic accuracy compared to the standard NFL thickness parameters.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography
Authors:
Kabir Hossain,
Ou Tan,
Po-Han Yeh,
Jie Wang,
Elizabeth White,
Dongseok Choi,
David Huang
Abstract:
Purpose: Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography (OCT) Methods: The study utilized OCT to scan participants with a cubic 6x6 mm disc scan. NFL reflectance were normalized by the average of bands below NFL and summarized. We selected several reference bands, including the pigment epithelium complex (PPEC), the band between NFL and Bruch's mem…
▽ More
Purpose: Reliability for Nerve Fiber Layer Reflectance Using Spectral Domain Optical Coherence Tomography (OCT) Methods: The study utilized OCT to scan participants with a cubic 6x6 mm disc scan. NFL reflectance were normalized by the average of bands below NFL and summarized. We selected several reference bands, including the pigment epithelium complex (PPEC), the band between NFL and Bruch's membrane (Post-NFL), and the top 50% of pixels with higher values were selected from the Post-NFL band by Post-NFL-Bright. Especially, we also included NFL attenuation coefficient (AC), which was equivalent to NFL reflectance normalized by all pixels below NFL. An experiment was designed to test the NFL reflectance against different levels of attenuation using neutral density filter (NDF). We also evaluated the within-visit and between-visit repeatability using a clinical dataset with normal and glaucoma eyes. Results: The experiment enrolled 20 healthy participants. The clinical dataset selected 22 normal and 55 glaucoma eyes with at least two visits form functional and structural OCT (FSOCT) study. The experiment showed that NFL reflectance normalized PPEC Max and Post-NFL-Bright had lowest dependence, slope=-0.77 and -1.34 dB/optical density on NDF levels, respectively. The clinical data showed that the NFL reflectance metrics normalized by Post-NFL-Bright or Post-NFL-Mean metrics had a trend of better repeatability and reproducibility than others, but the trend was not significant. All metrics demonstrated similar diagnostic accuracy (0.82-0.87), but Post-NFL-Bright provide the best result. Conclusions: The NFL reflectance normalized by the maximum in PPEC had less dependence of the global attenuation followed by Post-NFL-Bright, PPEC/Mean, Post-NFL-Mean and NFL/AC. But NFL reflectance normalized by Post-NFL-Bright had better result in two datasets.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Data Valuation by Leveraging Global and Local Statistical Information
Authors:
Xiaoling Zhou,
Ou Wu,
Michael K. Ng,
Hao Jiang
Abstract:
Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the…
▽ More
Data valuation has garnered increasing attention in recent years, given the critical role of high-quality data in various applications, particularly in machine learning tasks. There are diverse technical avenues to quantify the value of data within a corpus. While Shapley value-based methods are among the most widely used techniques in the literature due to their solid theoretical foundation, the accurate calculation of Shapley values is often intractable, leading to the proposal of numerous approximated calculation methods. Despite significant progress, nearly all existing methods overlook the utilization of distribution information of values within a data corpus. In this paper, we demonstrate that both global and local statistical information of value distributions hold significant potential for data valuation within the context of machine learning. Firstly, we explore the characteristics of both global and local value distributions across several simulated and real data corpora. Useful observations and clues are obtained. Secondly, we propose a new data valuation method that estimates Shapley values by incorporating the explored distribution characteristics into an existing method, AME. Thirdly, we present a new path to address the dynamic data valuation problem by formulating an optimization problem that integrates information of both global and local value distributions. Extensive experiments are conducted on Shapley value estimation, value-based data removal/adding, mislabeled data detection, and incremental/decremental data valuation. The results showcase the effectiveness and efficiency of our proposed methodologies, affirming the significant potential of global and local value distributions in data valuation.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Authors:
Shengyuan Ye,
Jiangsu Du,
Liekang Zeng,
Wenzhong Ou,
Xiaowen Chu,
Yutong Lu,
Xu Chen
Abstract:
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recogniz…
▽ More
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlap** of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
InsigHTable: Insight-driven Hierarchical Table Visualization with Reinforcement Learning
Authors:
Guozheng Li,
Peng He,
Xinyu Wang,
Runfei Li,
Chi Harold Liu,
Chuangxin Ou,
Dong He,
Guoren Wang
Abstract:
Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, t…
▽ More
Embedding visual representations within original hierarchical tables can mitigate additional cognitive load stemming from the division of users' attention. The created hierarchical table visualizations can help users understand and explore complex data with multi-level attributes. However, because of many options available for transforming hierarchical tables and selecting subsets for embedding, the design space of hierarchical table visualizations becomes vast, and the construction process turns out to be tedious, hindering users from constructing hierarchical table visualizations with many data insights efficiently. We propose InsigHTable, a mixed-initiative and insight-driven hierarchical table transformation and visualization system. We first define data insights within hierarchical tables, which consider the hierarchical structure in the table headers. Since hierarchical table visualization construction is a sequential decision-making process, InsigHTable integrates a deep reinforcement learning framework incorporating an auxiliary rewards mechanism. This mechanism addresses the challenge of sparse rewards in constructing hierarchical table visualizations. Within the deep reinforcement learning framework, the agent continuously optimizes its decision-making process to create hierarchical table visualizations to uncover more insights by collaborating with analysts. We demonstrate the usability and effectiveness of InsigHTable through two case studies and sets of experiments. The results validate the effectiveness of the deep reinforcement learning framework and show that InsigHTable can facilitate users to construct hierarchical table visualizations and understand underlying data insights.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG)
Authors:
Yucheng Cai,
Si Chen,
Yi Huang,
Junlan Feng,
Zhijian Ou
Abstract:
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024
The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG), Co-located with SLT 2024
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Pick-and-place transfer of arbitrary-metal electrodes for van der Waals device fabrication
Authors:
Kaijian Xing,
Daniel McEwen,
Weiyao Zhao,
Abdulhakim Bake,
David Cortie,
**gying Liu,
Thi-Hai-Yen Vu,
James Hone,
Alastair Stacey,
Mark T. Edmonds,
Kenji Watanabe,
Takashi Taniguchi,
Qingdong Ou,
Dong-Chen Qi,
Michael S. Fuhrer
Abstract:
Van der Waals electrode integration is a promising strategy to create near-perfect interfaces between metals and two-dimensional materials, with advantages such as eliminating Fermi-level pinning and reducing contact resistance. However, the lack of a simple, generalizable pick-and-place transfer technology has greatly hampered the wide use of this technique. We demonstrate the pick-and-place tran…
▽ More
Van der Waals electrode integration is a promising strategy to create near-perfect interfaces between metals and two-dimensional materials, with advantages such as eliminating Fermi-level pinning and reducing contact resistance. However, the lack of a simple, generalizable pick-and-place transfer technology has greatly hampered the wide use of this technique. We demonstrate the pick-and-place transfer of pre-fabricated electrodes from reusable polished hydrogenated diamond substrates without the use of any surface treatments or sacrificial layers. The technique enables transfer of large-scale arbitrary metal electrodes, as demonstrated by successful transfer of eight different elemental metals with work functions ranging from 4.22 to 5.65 eV. The mechanical transfer of metal electrodes from diamond onto van der Waals materials creates atomically smooth interfaces with no interstitial impurities or disorder, as observed with cross-sectional high-resolution transmission electron microscopy and energy-dispersive X-ray spectroscopy. As a demonstration of its device application, we use the diamond-transfer technique to create metal contacts to monolayer transition metal dichalcogenide semiconductors with high-work-function Pd, low-work-function Ti, and semi metal Bi to create n- and p-type field-effect transistors with low Schottky barrier heights. We also extend this technology to other applications such as ambipolar transistor and optoelectronics, paving the way for new device architectures and high-performance devices.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Authors:
Xinru Zhang,
Ni Ou,
Berke Doga Basaran,
Marco Visentin,
Mengyun Qiao,
Renyang Gu,
Cheng Ouyang,
Yaou Liu,
Paul M. Matthew,
Chuyang Ye,
Wenjia Bai
Abstract:
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation…
▽ More
Brain lesion segmentation plays an essential role in neurological research and diagnosis. As brain lesions can be caused by various pathological alterations, different types of brain lesions tend to manifest with different characteristics on different imaging modalities. Due to this complexity, brain lesion segmentation methods are often developed in a task-specific manner. A specific segmentation model is developed for a particular lesion type and imaging modality. However, the use of task-specific models requires predetermination of the lesion type and imaging modality, which complicates their deployment in real-world scenarios. In this work, we propose a universal foundation model for 3D brain lesion segmentation, which can automatically segment different types of brain lesions for input data of various imaging modalities. We formulate a novel Mixture of Modality Experts (MoME) framework with multiple expert networks attending to different imaging modalities. A hierarchical gating network combines the expert predictions and fosters expertise collaboration. Furthermore, we introduce a curriculum learning strategy during training to avoid the degeneration of each expert network and preserve their specialization. We evaluated the proposed method on nine brain lesion datasets, encompassing five imaging modalities and eight lesion types. The results show that our model outperforms state-of-the-art universal models and provides promising generalization to unseen datasets.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Optical Imaging of Flavor Order in Flat Band Graphene
Authors:
Tian Xie,
Tobias M. Wolf,
Siyuan Xu,
Zhiyuan Cui,
Richen Xiong,
Yunbo Ou,
Patrick Hays,
Ludwig F Holleis,
Yi Guo,
Owen I Sheekey,
Caitlin Patterson,
Trevor Arp,
Kenji Watanabe,
Takashi Taniguchi,
Seth Ariel Tongay,
Andrea F Young,
Allan H. MacDonald,
Chenhao **
Abstract:
Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transit…
▽ More
Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transition metal dichalcogenide layer. Through a systematic study of rhombohedral and rotationally faulted graphene bilayers and trilayers, we show that when the semiconducting dichalcogenide is in direct contact with the graphene, the exciton response is most sensitive to the large momentum rearrangement of the Fermi surface, providing information that is distinct from and complementary to electrical compressibility measurements. The wide-field imaging capability of optical probes allows us to obtain spatial maps of flavor orders with high throughput, and with broad temperature and device compatibility. Our work paves the way for optical probing and imaging of flavor orders in flat band graphene systems.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Oedipus: LLM-enchanced Reasoning CAPTCHA Solver
Authors:
Gelei Deng,
Haoran Ou,
Yi Liu,
Jie Zhang,
Tianwei Zhang,
Yang Liu
Abstract:
CAPTCHAs have become a ubiquitous tool in safeguarding applications from automated bots. Over time, the arms race between CAPTCHA development and evasion techniques has led to increasingly sophisticated and diverse designs. The latest iteration, reasoning CAPTCHAs, exploits tasks that are intuitively simple for humans but challenging for conventional AI technologies, thereby enhancing security mea…
▽ More
CAPTCHAs have become a ubiquitous tool in safeguarding applications from automated bots. Over time, the arms race between CAPTCHA development and evasion techniques has led to increasingly sophisticated and diverse designs. The latest iteration, reasoning CAPTCHAs, exploits tasks that are intuitively simple for humans but challenging for conventional AI technologies, thereby enhancing security measures.
Driven by the evolving AI capabilities, particularly the advancements in Large Language Models (LLMs), we investigate the potential of multimodal LLMs to solve modern reasoning CAPTCHAs. Our empirical analysis reveals that, despite their advanced reasoning capabilities, LLMs struggle to solve these CAPTCHAs effectively. In response, we introduce Oedipus, an innovative end-to-end framework for automated reasoning CAPTCHA solving. Central to this framework is a novel strategy that dissects the complex and human-easy-AI-hard tasks into a sequence of simpler and AI-easy steps. This is achieved through the development of a Domain Specific Language (DSL) for CAPTCHAs that guides LLMs in generating actionable sub-steps for each CAPTCHA challenge. The DSL is customized to ensure that each unit operation is a highly solvable subtask revealed in our previous empirical study. These sub-steps are then tackled sequentially using the Chain-of-Thought (CoT) methodology.
Our evaluation shows that Oedipus effectively resolves the studied CAPTCHAs, achieving an average success rate of 63.5\%. Remarkably, it also shows adaptability to the most recent CAPTCHA designs introduced in late 2023, which are not included in our initial study. This prompts a discussion on future strategies for designing reasoning CAPTCHAs that can effectively counter advanced AI solutions.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Towards Accurate and Robust Architectures via Neural Architecture Search
Authors:
Yuwei Ou,
Yuqi Feng,
Yanan Sun
Abstract:
To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propo…
▽ More
To defend deep neural networks from adversarial attacks, adversarial training has been drawing increasing attention for its effectiveness. However, the accuracy and robustness resulting from the adversarial training are limited by the architecture, because adversarial training improves accuracy and robustness by adjusting the weight connection affiliated to the architecture. In this work, we propose ARNAS to search for accurate and robust architectures for adversarial training. First we design an accurate and robust search space, in which the placement of the cells and the proportional relationship of the filter numbers are carefully determined. With the design, the architectures can obtain both accuracy and robustness by deploying accurate and robust structures to their sensitive positions, respectively. Then we propose a differentiable multi-objective search strategy, performing gradient descent towards directions that are beneficial for both natural loss and adversarial loss, thus the accuracy and robustness can be guaranteed at the same time. We conduct comprehensive experiments in terms of white-box attacks, black-box attacks, and transferability. Experimental results show that the searched architecture has the strongest robustness with the competitive accuracy, and breaks the traditional idea that NAS-based architectures cannot transfer well to complex tasks in robustness scenarios. By analyzing outstanding architectures searched, we also conclude that accurate and robust neural architectures tend to deploy different structures near the input and output, which has great practical significance on both hand-crafting and automatically designing of accurate and robust architectures.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Stellar Population near NGC 2021: Procession of Star Formation in the South Rim of Supergiant Shell LMC 4
Authors:
Po-Sheng Ou,
Rui-Ching Chao,
You-Hua Chu,
Chin-Yi Hsu,
Chuan-Jui Li
Abstract:
Supergiant shells (SGSs) are the largest interstellar structures where heated and enriched gas flows into the host galaxy's halo. The SGSs in the Large Magellanic Cloud (LMC) are so close that their stars can be resolved with ground-based telescopes to allow studies of star formation history. Aiming to study the star formation history and energy budget of LMC 4, we have conducted a pilot study of…
▽ More
Supergiant shells (SGSs) are the largest interstellar structures where heated and enriched gas flows into the host galaxy's halo. The SGSs in the Large Magellanic Cloud (LMC) are so close that their stars can be resolved with ground-based telescopes to allow studies of star formation history. Aiming to study the star formation history and energy budget of LMC 4, we have conducted a pilot study of the cluster NGC 2021 and the OB associations in its vicinity near the south rim of LMC 4. We use the Magellanic Cloud Photometric Survey data of the LMC to establish a methodology to examine the stellar population and assess the massive star formation history. We find a radial procession of massive star formation from the northwest part of the OB association LH79 through NGC 2021 to the OB association LH78 in the south. Using the stellar content of NGC 2021 and the assumption of Salpeter's initial mass function, we estimate that $\sim$4 supernovae have occurred in NGC 2021, injecting at least $4\times10^{51}$ ergs of kinetic energy into the interior of LMC 4.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
HAFFormer: A Hierarchical Attention-Free Framework for Alzheimer's Disease Detection From Spontaneous Speech
Authors:
Zhongren Dong,
Zixing Zhang,
Weixiang Xu,
**g Han,
Jianjun Ou,
Björn W. Schuller
Abstract:
Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying…
▽ More
Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis. Recent approaches highly rely on the Transformer architectures due to its efficiency in modelling long-range context dependencies. However, the quadratic increase in computational complexity associated with self-attention and the length of audio poses a challenge when deploying such models on edge devices. In this context, we construct a novel framework, namely Hierarchical Attention-Free Transformer (HAFFormer), to better deal with long speech for AD detection. Specifically, we employ an attention-free module of Multi-Scale Depthwise Convolution to replace the self-attention and thus avoid the expensive computation, and a GELU-based Gated Linear Unit to replace the feedforward layer, aiming to automatically filter out the redundant information. Moreover, we design a hierarchical structure to force it to learn a variety of information grains, from the frame level to the dialogue level. By conducting extensive experiments on the ADReSS-M dataset, the introduced HAFFormer can achieve competitive results (82.6% accuracy) with other recent work, but with significant computational complexity and model size reduction compared to the standard Transformer. This shows the efficiency of HAFFormer in dealing with long audio for AD detection.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Cosine Annealing Optimized Denoising Diffusion Error Correction Codes
Authors:
Congyang Ou,
Xiao**g Chen,
Wan Jiang
Abstract:
To address the issue of increased bit error rates during the later stages of linear search in denoising diffusion error correction codes, we propose a novel method that optimizes denoising diffusion error correction codes (ECC) using cosine annealing. In response to the challenge of decoding long codewords, the proposed method employs a variance adjustment strategy during the reverse diffusion pro…
▽ More
To address the issue of increased bit error rates during the later stages of linear search in denoising diffusion error correction codes, we propose a novel method that optimizes denoising diffusion error correction codes (ECC) using cosine annealing. In response to the challenge of decoding long codewords, the proposed method employs a variance adjustment strategy during the reverse diffusion process, rather than maintaining a constant variance. By leveraging cosine annealing, this method effectively lowers the bit error rate and enhances decoding effciency. This letter extensively validates the approach through experiments and demonstrates signifcant improvements in bit error rate reduction and iteration effciency compared to existing methods. This advancement offers a promising solution for improving ECC decoding performance, potentially impacting secure digital communication practices.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Physical properties and electronic structure of the two-gap superconductor V$_{2}$Ga$_{5}$
Authors:
P. -Y. Cheng,
Mohamed Oudah,
T. -L. Hung,
C. -E. Hsu,
C. -C. Chang,
J. -Y. Haung,
T. -C. Liu,
C. -M. Cheng,
M. -N. Ou,
W. -T. Chen,
L. Z. Deng,
C. -C. Lee,
Y. -Y. Chen,
C. -N. Kuo,
C. -S. Lue,
Janna Machts,
Kenji M. Kojima,
Alannah M. Hallas,
C. -L. Huang
Abstract:
We present a thorough investigation of the physical properties and superconductivity of the binary intermetallic V2Ga5. Electrical resistivity and specific heat measurements show that V2Ga5 enters its superconducting state below Tsc = 3.5 K, with a critical field of Hc2,perp c(Hc2,para c) = 6.5(4.1) kOe. With H perp c, the peak effect was observed in resistivity measurements, indicating the ultrah…
▽ More
We present a thorough investigation of the physical properties and superconductivity of the binary intermetallic V2Ga5. Electrical resistivity and specific heat measurements show that V2Ga5 enters its superconducting state below Tsc = 3.5 K, with a critical field of Hc2,perp c(Hc2,para c) = 6.5(4.1) kOe. With H perp c, the peak effect was observed in resistivity measurements, indicating the ultrahigh quality of the single crystal studied. The resistivity measurements under high pressure reveal that the Tsc is suppressed linearly with pressure and reaches absolute zero around 20 GPa. Specific heat and muon spin relaxation measurements both indicate that the two-gap s-wave model best describes the superconductivity of V2Ga5. The spectra obtained from angle-resolved photoemission spectroscopy measurements suggest that two superconducting gaps open at the Fermi surface around the Z and Γ points. These results are verified by first-principles band structure calculations. We therefore conclude that V2Ga5 is a phonon-mediated two-gap s-wave superconductor
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Three Quantization Regimes for ReLU Networks
Authors:
Weigutian Ou,
Philipp Schenkel,
Helmut Bölcskei
Abstract:
We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds…
▽ More
We establish the fundamental limits in the approximation of Lipschitz functions by deep ReLU neural networks with finite-precision weights. Specifically, three regimes, namely under-, over-, and proper quantization, in terms of minimax approximation error behavior as a function of network weight precision, are identified. This is accomplished by deriving nonasymptotic tight lower and upper bounds on the minimax approximation error. Notably, in the proper-quantization regime, neural networks exhibit memory-optimality in the approximation of Lipschitz functions. Deep networks have an inherent advantage over shallow networks in achieving memory-optimality. We also develop the notion of depth-precision tradeoff, showing that networks with high-precision weights can be converted into functionally equivalent deeper networks with low-precision weights, while preserving memory-optimality. This idea is reminiscent of sigma-delta analog-to-digital conversion, where oversampling rate is traded for resolution in the quantization of signal samples. We improve upon the best-known ReLU network approximation results for Lipschitz functions and describe a refinement of the bit extraction technique which could be of independent general interest.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting
Authors:
Yifei Gao,
Jie Ou,
Lei Wang,
Jun Cheng
Abstract:
Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly de…
▽ More
Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrap** method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrap** effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.
△ Less
Submitted 12 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence
Authors:
Artur Grigorev,
Adriana-Simona Mihaita Khaled Saleh,
Yuming Ou
Abstract:
Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces…
▽ More
Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems.
△ Less
Submitted 29 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.