Search | arXiv e-print repository

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in kee** up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in kee** up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, yet current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems. In this paper, we evaluated LLMs by RAG settings in a domain-specific context, college enrollment. We identified six required abilities for RAG models, including the ability in conversational RAG, analyzing structural information, faithfulness to external knowledge, denoising, solving time-sensitive problems, and understanding multi-document interactions. Each ability has an associated dataset with shared corpora to evaluate the RAG models' performance. We evaluated popular LLMs such as Llama, Baichuan, ChatGLM, and GPT models. Experimental results indicate that existing closed-book LLMs struggle with domain-specific questions, highlighting the need for RAG models to solve expert problems. Moreover, there is room for RAG models to improve their abilities in comprehending conversational history, analyzing structural information, denoising, processing multi-document interactions, and faithfulness in expert knowledge. We expect future studies could solve these problems better. △ Less

Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.05551 [pdf, other]

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Authors: Zhijun Liu, Shuai Wang, Sho Inoue, Qibing Bai, Haizhou Li

Abstract: Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the i… ▽ More Audio language models have recently emerged as a promising approach for various audio generation tasks, relying on audio tokenizers to encode waveforms into sequences of discrete symbols. Audio tokenization often poses a necessary compromise between code bitrate and reconstruction accuracy. When dealing with low-bitrate audio codes, language models are constrained to process only a subset of the information embedded in the audio, which in turn restricts their generative capabilities. To circumvent these issues, we propose encoding audio as vector sequences in continuous space $\mathbb R^d$ and autoregressively generating these sequences using a decoder-only diffusion transformer (ARDiT). Our findings indicate that ARDiT excels in zero-shot text-to-speech and exhibits performance that compares to or even surpasses that of state-of-the-art models. High-bitrate continuous speech representation enables almost flawless reconstruction, allowing our model to achieve nearly perfect speech editing. Our experiments reveal that employing Integral Kullback-Leibler (IKL) divergence for distillation at each autoregressive step significantly boosts the perceived quality of the samples. Simultaneously, it condenses the iterative sampling process of the diffusion model into a single step. Furthermore, ARDiT can be trained to predict several continuous vectors in one step, significantly reducing latency during sampling. Impressively, one of our models can generate $170$ ms of $24$ kHz speech per evaluation step with minimal degradation in performance. Audio samples are available at http://ardit-tts.github.io/ . △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05516 [pdf, other]

Verbalized Probabilistic Graphical Modeling with Large Language Models

Authors: Hengguan Huang, Xing Shen, Songtao Wang, Dianbo Liu, Hao Wang

Abstract: Faced with complex problems, the human brain demonstrates a remarkable capacity to transcend sensory input and form latent understandings of perceived world patterns. However, this cognitive capacity is not explicitly considered or encoded in current large language models (LLMs). As a result, LLMs often struggle to capture latent structures and model uncertainty in complex compositional reasoning… ▽ More Faced with complex problems, the human brain demonstrates a remarkable capacity to transcend sensory input and form latent understandings of perceived world patterns. However, this cognitive capacity is not explicitly considered or encoded in current large language models (LLMs). As a result, LLMs often struggle to capture latent structures and model uncertainty in complex compositional reasoning tasks. This work introduces a novel Bayesian prompting approach that facilitates training-free Bayesian inference with LLMs by using a verbalized Probabilistic Graphical Model (PGM). While traditional Bayesian approaches typically depend on extensive data and predetermined mathematical structures for learning latent factors and dependencies, our approach efficiently reasons latent variables and their probabilistic dependencies by prompting LLMs to adhere to Bayesian principles. We evaluated our model on several compositional reasoning tasks, both close-ended and open-ended. Our results indicate that the model effectively enhances confidence elicitation and text generation quality, demonstrating its potential to improve AI language understanding systems, especially in modeling uncertainty. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05498 [pdf, other]

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

Authors: Xunguang Wang, Daoyuan Wu, Zhenlan Ji, Zongjie Li, **chuan Ma, Shuai Wang, Yingjiu Li, Yang Liu, Ning Liu, Juergen Rahmel

Abstract: Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering… ▽ More Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into four major categories: optimization-based attacks such as Greedy Coordinate Gradient (GCG), jailbreak template-based attacks such as "Do-Anything-Now", advanced indirect attacks like DrAttack, and multilingual jailbreaks. However, delivering a practical jailbreak defense is challenging because it needs to not only handle all the above jailbreak attacks but also incur negligible delay to user prompts, as well as be compatible with both open-source and closed-source LLMs. Inspired by how the traditional security concept of shadow stacks defends against memory overflow attacks, this paper introduces a generic LLM jailbreak defense framework called SelfDefend, which establishes a shadow LLM defense instance to concurrently protect the target LLM instance in the normal stack and collaborate with it for checkpoint-based access control. The effectiveness of SelfDefend builds upon our observation that existing LLMs (both target and defense LLMs) have the capability to identify harmful prompts or intentions in user queries, which we empirically validate using the commonly used GPT-3.5/4 models across all major jailbreak attacks. Our measurements show that SelfDefend enables GPT-3.5 to suppress the attack success rate (ASR) by 8.97-95.74% (average: 60%) and GPT-4 by even 36.36-100% (average: 83%), while incurring negligible effects on normal queries. To further improve the defense's robustness and minimize costs, we employ a data distillation approach to tune dedicated open-source defense models. These models outperform four SOTA defenses and match the performance of GPT-4-based SelfDefend, with significantly lower extra delays. We also empirically show that the tuned models are robust to targeted GCG and prompt injection attacks. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: This paper completes its earlier vision paper, available at arXiv:2402.15727

arXiv:2406.05462 [pdf, other]

MatrixGate: A High-performance Data Ingestion Tool for Time-series Databases

Authors: Shuhui Wang, Zihan Sun, Chaochen Hu, Chao Li, Yong Zhang, Yandong Yao, Hao Wang, Chunxiao Xing

Abstract: Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to low loading speed and poor timeliness. To address these challenges, this paper presents MatrixGate, a new and efficient data ingestion approach for ma… ▽ More Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to low loading speed and poor timeliness. To address these challenges, this paper presents MatrixGate, a new and efficient data ingestion approach for massive time-series data. MatrixGate implements both single-instance and multi-instance parallel procedures, which is based on its unique ingestion strategies. First, MatrixGate uses policies to tune the slots that are synchronized with segments to ingest data, which eliminates the cost of starting transactions and enhance the efficiency. Second, multi-coroutines are responsible for transfer data, which can increase the degree of parallelism significantly. Third, lock-free queues are used to enable direct data transfer without the need for disk storage or lodging in the master instance. Experiment results on multiple datasets show that MatrixGate outperforms state-of-the-art methods by 3 to 100 times in loading speed, and cuts down about 80% query latency. Furthermore, MatrixGate scales out efficiently under distributed architecture, achieving scalability of 86%. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05034 [pdf, other]

Multiple peaks in gravitational waves induced from primordial curvature perturbations with non-Gaussianity

Authors: Xiang-Xi Zeng, Rong-Gen Cai, Shao-Jiang Wang

Abstract: First-order primordial curvature perturbations are known to induce gravitational waves at the second-order, which can in turn probe the small-scale curvature perturbations near the end of the inflation. In this work, we extend the previous analysis in the Gaussian case into the non-Gaussian case, with particular efforts to obtain some thumb rules of sandwiching the associated peaks in gravitationa… ▽ More First-order primordial curvature perturbations are known to induce gravitational waves at the second-order, which can in turn probe the small-scale curvature perturbations near the end of the inflation. In this work, we extend the previous analysis in the Gaussian case into the non-Gaussian case, with particular efforts to obtain some thumb rules of sandwiching the associated peaks in gravitational waves induced from multiple peaks of non-Gaussian curvature perturbations. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 22 pages, 6 figures

arXiv:2406.04906 [pdf, other]

RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Authors: Liting Huang, Zhihao Zhang, Yiran Zhang, Xiyue Zhou, Shou** Wang

Abstract: The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robu… ▽ More The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04875 [pdf, other]

3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views

Authors: Xiaobiao Du, Haiyang Sun, Shuyun Wang, Zhuojie Wu, Hongwei Sheng, Jiaying Ying, Ming Lu, Tianqing Zhu, Kun Zhan, Xin Yu

Abstract: 3D cars are commonly used in self-driving systems, virtual/augmented reality, and games. However, existing 3D car datasets are either synthetic or low-quality, presenting a significant gap toward the high-quality real-world 3D car datasets and limiting their applications in practical scenarios. In this paper, we propose the first large-scale 3D real car dataset, termed 3DRealCar, offering three di… ▽ More 3D cars are commonly used in self-driving systems, virtual/augmented reality, and games. However, existing 3D car datasets are either synthetic or low-quality, presenting a significant gap toward the high-quality real-world 3D car datasets and limiting their applications in practical scenarios. In this paper, we propose the first large-scale 3D real car dataset, termed 3DRealCar, offering three distinctive features. (1) \textbf{High-Volume}: 2,500 cars are meticulously scanned by 3D scanners, obtaining car images and point clouds with real-world dimensions; (2) \textbf{High-Quality}: Each car is captured in an average of 200 dense, high-resolution 360-degree RGB-D views, enabling high-fidelity 3D reconstruction; (3) \textbf{High-Diversity}: The dataset contains various cars from over 100 brands, collected under three distinct lighting conditions, including reflective, standard, and dark. Additionally, we offer detailed car parsing maps for each instance to promote research in car parsing tasks. Moreover, we remove background point clouds and standardize the car orientation to a unified axis for the reconstruction only on cars without background and controllable rendering. We benchmark 3D reconstruction results with state-of-the-art methods across each lighting condition in 3DRealCar. Extensive experiments demonstrate that the standard lighting condition part of 3DRealCar can be used to produce a large number of high-quality 3D cars, improving various 2D and 3D tasks related to cars. Notably, our dataset brings insight into the fact that recent 3D reconstruction methods face challenges in reconstructing high-quality 3D cars under reflective and dark lighting conditions. \textcolor{red}{\href{https://xiaobiaodu.github.io/3drealcar/}{Our dataset is available here.}} △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: Project Page: https://xiaobiaodu.github.io/3drealcar

arXiv:2406.04451 [pdf, other]

RiskMap: A Unified Driving Context Representation for Autonomous Motion Planning in Urban Driving Environment

Authors: Ren Xin, Sheng Wang, Yingbing Chen, Jie Cheng, Ming Liu

Abstract: Planning is complicated by the combination of perception and map information, particularly when driving in heavy traffic. Develo** an extendable and efficient representation that visualizes sensor noise and provides constraints to real-time planning tasks is desirable. We aim to develop an extendable map representation offering prior to cost in planning tasks to simplify the planning process of… ▽ More Planning is complicated by the combination of perception and map information, particularly when driving in heavy traffic. Develo** an extendable and efficient representation that visualizes sensor noise and provides constraints to real-time planning tasks is desirable. We aim to develop an extendable map representation offering prior to cost in planning tasks to simplify the planning process of dealing with complex driving scenarios and visualize sensor noise. In this paper, we illustrate a unified context representation empowered by a modern deep learning motion prediction model, representing statistical cognition of motion prediction for human beings. A sampling-based planner is adopted to train and compare the difference in risk map generation methods. The training tools and model structures are investigated illustrating their efficiency in this task. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Submission to ICRA 2023 was not accepted. This paper is now available just for public reference

arXiv:2406.03867 [pdf, other]

A Comprehensive Study of Quantum Arithmetic Circuits

Authors: Siyi Wang, Xiufan Li, Wei Jie Bryan Lee, Suman Deb, Eugene Lim, Anupam Chattopadhyay

Abstract: In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention.… ▽ More In recent decades, the field of quantum computing has experienced remarkable progress. This progress is marked by the superior performance of many quantum algorithms compared to their classical counterparts, with Shor's algorithm serving as a prominent illustration. Quantum arithmetic circuits, which are the fundamental building blocks in numerous quantum algorithms, have attracted much attention. Despite extensive exploration of various designs in the existing literature, researchers remain keen on develo** novel designs and improving existing ones. In this review article, we aim to provide a systematically organized and easily comprehensible overview of the current state-of-the-art in quantum arithmetic circuits. Specifically, this study covers fundamental operations such as addition, subtraction, multiplication, division and modular exponentiation. We delve into the detailed quantum implementations of these prominent designs and evaluate their efficiency considering various objectives. We also discuss potential applications of presented arithmetic circuits and suggest future research directions. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Under review at the Royal Society's Philosophical Transactions A

arXiv:2406.03798 [pdf]

Optical biomarker of metabolism for breast tumor diagnosis: Insights from subcellular dynamics

Authors: Zichen Yin, Shuwei Zhang, Bin He, Houpu Yang, Zhengyu Chen, Zhangwei Hu, Yejiong Shi, Ruizhi Xue, Panqi Yang, Yuzhe Ying, Chengming Wang, Shu Wang, ** Xue

Abstract: Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontroll… ▽ More Label-free metabolic dynamics contrast is highly appealing but difficult to achieve in biomedical imaging. Interference offers a highly sensitive mechanism for capturing the metabolic dynamics of the subcellular scatterers. However, traditional interference detection methods fail to isolate pure metabolic dynamics, as the dynamic signals are coupled with scatterer reflectivity and other uncontrollable imaging factors. Here, we demonstrate active phase modulation-assisted dynamic full-field optical coherence tomography (APMD-FFOCT) that decouples and quantifies the metabolic dynamics by adding a reference movement for all interferential scatterers. This novel technique enables imaging and dynamic analysis of subcellular structures along with their changes during the apoptotic process in tumor tissues. Furthermore, the nucleus-to-cytoplasm dynamic intensity ratio could serve as an optical biomarker for breast tumor grading, enhancing intraoperative diagnosis. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03730 [pdf, other]

FastGAS: Fast Graph-based Annotation Selection for In-Context Learning

Authors: Zihan Chen, Song Wang, Cong Shen, Jundong Li

Abstract: In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Since generating the prompts needs to sample from a vast pool of instances and annotate them (e.g., add labels in classification task), existing methods have proposed to select a subset of unlabeled examples for annotation, thus enhancing the quality of prompts an… ▽ More In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Since generating the prompts needs to sample from a vast pool of instances and annotate them (e.g., add labels in classification task), existing methods have proposed to select a subset of unlabeled examples for annotation, thus enhancing the quality of prompts and concurrently mitigating annotation costs. However, these methods often require a long time to select instances due to their complexity, hindering their practical viability. To address this limitation, we propose a graph-based selection method, FastGAS, designed to efficiently identify high-quality instances while minimizing computational overhead. Initially, we construct a data similarity graph based on instance similarities. Subsequently, employing a graph partitioning algorithm, we partition the graph into pieces. Within each piece (i.e., subgraph), we adopt a greedy approach to pick the most representative nodes. By aggregating nodes from diverse pieces and annotating the corresponding instances, we identify a set of diverse and representative instances for ICL. Compared to prior approaches, our method not only exhibits superior performance on different tasks but also significantly reduces selection time. In addition, we demonstrate the efficacy of our approach in LLMs of larger sizes. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.03450 [pdf, other]

What is the Best Way for ChatGPT to Translate Poetry?

Authors: Shanshan Wang, Derek F. Wong, **gming Yao, Lidia S. Chao

Abstract: Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to asce… ▽ More Machine translation (MT) has historically faced significant challenges when applied to literary works, particularly in the domain of poetry translation. The advent of Large Language Models such as ChatGPT holds potential for innovation in this field. This study examines ChatGPT's capabilities in English-Chinese poetry translation tasks, utilizing targeted prompts and small sample scenarios to ascertain optimal performance. Despite promising outcomes, our analysis reveals persistent issues in the translations generated by ChatGPT that warrant attention. To address these shortcomings, we propose an Explanation-Assisted Poetry Machine Translation (EAPMT) method, which leverages monolingual poetry explanation as a guiding information for the translation process. Furthermore, we refine existing evaluation criteria to better suit the nuances of modern poetry translation. We engaged a panel of professional poets for assessments, complemented evaluations by using GPT-4. The results from both human and machine evaluations demonstrate that our EAPMT method outperforms traditional translation methods of ChatGPT and the existing online systems. This paper validates the efficacy of our method and contributes a novel perspective to machine-assisted literary translation. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 19 pages, 1 figure. The paper has been accepted by ACL 2024(Main Conference)

arXiv:2406.03391 [pdf, other]

Joint Association, Beamforming, and Resource Allocation for Multi-IRS Enabled MU-MISO Systems With RSMA

Authors: Chunjie Wang, Xuhui Zhang, Huijun Xing, Liang Xue, Shuqiang Wang, Yanyan Shen, Bo Yang, ** Guan

Abstract: Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) scheme… ▽ More Intelligent reflecting surface (IRS) and rate-splitting multiple access (RSMA) technologies are at the forefront of enhancing spectrum and energy efficiency in the next generation multi-antenna communication systems. This paper explores a RSMA system with multiple IRSs, and proposes two purpose-driven scheduling schemes, i.e., the exhaustive IRS-aided (EIA) and opportunistic IRS-aided (OIA) schemes. The aim is to optimize the system weighted energy efficiency (EE) under the above two schemes, respectively. Specifically, the Dinkelbach, branch and bound, successive convex approximation, and the semidefinite relaxation methods are exploited within the alternating optimization framework to obtain effective solutions to the considered problems. The numerical findings indicate that the EIA scheme exhibits better performance compared to the OIA scheme in diverse scenarios when considering the weighted EE, and the proposed algorithm demonstrates superior performance in comparison to the baseline algorithms. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.03326 [pdf]

Calibrated absolute optical contrast for high-throughput characterization of horizontally aligned carbon nanotube arrays

Authors: Yue Li, Ying Xie, Jian** Wang, Yang Xu, Shurui Wang, Yunbiao Zhao, Liu Qian, Ziqiang Zhao, ** Zhang

Abstract: Horizontally aligned carbon nanotube (HACNT) arrays hold significant potential for various applications in nanoelectronics and material science. However, their high-throughput characterization remains challenging due to the lack of methods with both high efficiency and high accuracy. Here, we present a novel technique, Calibrated Absolute Optical Contrast (CAOC), achieved through the implementatio… ▽ More Horizontally aligned carbon nanotube (HACNT) arrays hold significant potential for various applications in nanoelectronics and material science. However, their high-throughput characterization remains challenging due to the lack of methods with both high efficiency and high accuracy. Here, we present a novel technique, Calibrated Absolute Optical Contrast (CAOC), achieved through the implementation of differential principles to filter out stray signals and high-resolution calibration to endow optical contrast with physical significance. CAOC offers major advantages over previous characterization techniques, providing consistent and reliable measurements of HACNT array density with high throughput and non-destructive assessment. To validate its utility, we demonstrate wafer-scale uniformity assessment by rapid density map**. This technique not only facilitates the practical evaluation of HACNT arrays but also provides insights into balancing high throughput and high resolution in nanomaterial characterization. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02931 [pdf, other]

Measurements of the branching fractions of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^-π^0/η$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (643 additional authors not shown)

Abstract: Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for… ▽ More Based on $(2712.4\pm 14.3)\times10^{6}$ $ψ(3686)$ events, we investigate four hadronic decay modes of the $P$-wave charmonium spin-singlet state $h_c(^1P_1) \to h^+ h^- π^0/η$ ($h=π$ or $K$) via the process $ψ(3686) \to π^{0}h_c$ at BESIII. The $h_c \to π^+ π^- π^0$ decay is observed with a significance of 9.6$σ$ after taking into account systematic uncertainties. Evidences for $h_c \to K^+ K^- π^0$ and $h_c \to K^+ K^- η$ are found with significances of $3.5σ$ and $3.3σ$, respectively, after considering the systematic uncertainties. The branching fractions of these decays are measured to be $\mathcal{B}(h_c \to π^+ π^- π^0)=(1.36\pm0.16\pm0.14)\times10^{-3}$, $\mathcal{B}(h_c \to K^+ K^- π^0)=(3.26\pm0.84\pm0.36)\times10^{-4}$, and $\mathcal{B}(h_c \to K^+ K^- η)=(3.13\pm1.08\pm0.38)\times10^{-4}$, where the first uncertainties are statistical and the second are systematic. No significant signal of $h_c\toπ^+π^-η$ is found, and the upper limit of its decay branching fraction is determined to be $\mathcal{B}(h_c\toπ^+π^-η) < 4.0 \times 10^{-4}$ at 90% confidence level. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 9 pages, 7 figures

arXiv:2406.02880 [pdf, other]

Controllable Talking Face Generation by Implicit Facial Keypoints Editing

Authors: Dong Zhao, Jiaying Shi, Wenjun Li, Shudong Wang, Shenghui Xu, Zhaoming Pan

Abstract: Audio-driven talking face generation has garnered significant interest within the domain of digital human research. Existing methods are encumbered by intricate model architectures that are intricately dependent on each other, complicating the process of re-editing image or video inputs. In this work, we present ControlTalk, a talking face generation method to control face expression deformation b… ▽ More Audio-driven talking face generation has garnered significant interest within the domain of digital human research. Existing methods are encumbered by intricate model architectures that are intricately dependent on each other, complicating the process of re-editing image or video inputs. In this work, we present ControlTalk, a talking face generation method to control face expression deformation based on driven audio, which can construct the head pose and facial expression including lip motion for both single image or sequential video inputs in a unified manner. By utilizing a pre-trained video synthesis renderer and proposing the lightweight adaptation, ControlTalk achieves precise and naturalistic lip synchronization while enabling quantitative control over mouth opening shape. Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD. The parameterized adaptation demonstrates remarkable generalization capabilities, effectively handling expression deformation across same-ID and cross-ID scenarios, and extending its utility to out-of-domain portraits, regardless of languages. △ Less