Search | arXiv e-print repository

A survey on fairness of large language models in e-commerce: progress, application, and challenge

Authors: Qingyang Ren, Zilin Jiang, **ghan Cao, Sijia Li, Chiqu Li, Yiyang Liu, Shuning Huo, Tiange He, Yuan Chen

Abstract: This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing… ▽ More This survey explores the fairness of large language models (LLMs) in e-commerce, examining their progress, applications, and the challenges they face. LLMs have become pivotal in the e-commerce domain, offering innovative solutions and enhancing customer experiences. This work presents a comprehensive survey on the applications and challenges of LLMs in e-commerce. The paper begins by introducing the key principles underlying the use of LLMs in e-commerce, detailing the processes of pretraining, fine-tuning, and prompting that tailor these models to specific needs. It then explores the varied applications of LLMs in e-commerce, including product reviews, where they synthesize and analyze customer feedback; product recommendations, where they leverage consumer data to suggest relevant items; product information translation, enhancing global accessibility; and product question and answer sections, where they automate customer support. The paper critically addresses the fairness challenges in e-commerce, highlighting how biases in training data and algorithms can lead to unfair outcomes, such as reinforcing stereotypes or discriminating against certain groups. These issues not only undermine consumer trust, but also raise ethical and legal concerns. Finally, the work outlines future research directions, emphasizing the need for more equitable and transparent LLMs in e-commerce. It advocates for ongoing efforts to mitigate biases and improve the fairness of these systems, ensuring they serve diverse global markets effectively and ethically. Through this comprehensive analysis, the survey provides a holistic view of the current landscape of LLMs in e-commerce, offering insights into their potential and limitations, and guiding future endeavors in creating fairer and more inclusive e-commerce environments. △ Less

Submitted 21 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

Comments: 21 pages, 9 figures

arXiv:2405.12687 [pdf, other]

Large band-splitting in $g$-wave type altermagnet CrSb

Authors: Jianyang Ding, Zhicheng Jiang, Xiuhua Chen, Zicheng Tao, Zhengtai Liu, Jishan Liu, Tongrui Li, Jiayu Liu, Yichen Yang, Runfeng Zhang, Liwei Deng, Wenchuan **g, Yu Huang, Yuming Shi, Shan Qiao, Yilin Wang, Yanfeng Guo, Donglai Feng, Dawei Shen

Abstract: Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature… ▽ More Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature is still essential. Here, our research focuses on CrSb, which possesses N{é}el temperature of up to 700K and giant spin splitting near the Fermi level ($E_F$). Utilizing high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations, we meticulously map the three-dimensional electronic structure of CrSb. Our photoemission spectroscopic results on both (0001) and (10$\overline{1}$0) cleavages of CrSb collaboratively reveal unprecedented details on AM-induced band splitting, and subsequently pin down its unique bulk $g$-wave symmetry through quantitative analysis of the angular and photon-energy dependence of spin splitting. Moreover, the observed spin splitting reaches the magnitude of 0.93~eV near $E_F$, the most substantial among all confirmed AM materials. This study not only validates the nature of CrSb as a prototype $g$-wave like AM material but also underscores its pivotal role in pioneering applications in spintronics. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12679 [pdf]

Observation of Spin Splitting in Room-Temperature Metallic Antiferromagnet CrSb

Authors: Meng Zeng, Ming-Yuan Zhu, Yu-Peng Zhu, Xiang-Rui Liu, Xiao-Ming Ma, Yu-Jie Hao, Pengfei Liu, Gexing Qu, Yichen Yang, Zhicheng Jiang, Kohei Yamagami, Masashi Arita, Xiaoqian Zhang, Tian-Hao Shao, Yue Dai, Kenya Shimada, Zhengtai Liu, Mao Ye, Yaobo Huang, Qihang Liu, Chang Liu

Abstract: Recently, unconventional antiferromagnets that enable the splitting of electronic spins have been theoretically proposed and experimentally realized, where the magnetic sublattices containing moments pointing at different directions are connected by a novel set of symmetries. Such spin splitting (SS) is substantial, $k$-dependent, and independent of the spin-orbit coupling strength, making these m… ▽ More Recently, unconventional antiferromagnets that enable the splitting of electronic spins have been theoretically proposed and experimentally realized, where the magnetic sublattices containing moments pointing at different directions are connected by a novel set of symmetries. Such spin splitting (SS) is substantial, $k$-dependent, and independent of the spin-orbit coupling strength, making these magnets promising materials for antiferromagnetic spintronics. Here, combined with angle-resolved photoemission spectroscopy (ARPES) and density functional theory (DFT) calculations, we perform a systematic study on CrSb, a metallic spin-split antiferromagnet candidate with $T_N$ = 703 K. Our data reveals the electronic structure of CrSb along both out-of-plane and in-plane momentum directions, which renders anisotropic $k$-dependent SS and agrees well with the calculational results. The magnitude of such SS reaches up to at least 0.8 eV at non-high-symmetry momentum points, which is significantly higher than the largest known SOC-induced SS. This compound expands the choice of materials in the field of antiferromagnetic spintronics and is likely to stimulate subsequent investigations of high-efficiency spintronic devices that are functional at room temperature. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 14 pages, 4 figures

arXiv:2405.11966 [pdf, other]

Multiple-Choice Questions are Efficient and Robust LLM Evaluators

Authors: Ziyin Zhang, Zhaokun Jiang, Lizhen Xu, Hongkun Hao, Rui Wang

Abstract: We present GSM-MC, a multiple-choice (MC) dataset constructed by collecting answers and incorrect predictions on GSM8K from 60 open-source models. Through extensive experiments, we show that LLMs' performance on the MC version of this popular benchmark is strongly correlated with their performance on the original version and is quite robust to distractor choices and option orders, while the evalua… ▽ More We present GSM-MC, a multiple-choice (MC) dataset constructed by collecting answers and incorrect predictions on GSM8K from 60 open-source models. Through extensive experiments, we show that LLMs' performance on the MC version of this popular benchmark is strongly correlated with their performance on the original version and is quite robust to distractor choices and option orders, while the evaluation time is reduced by a factor of up to 30. Following similar procedures, we introduce MATH-MC, constructed from MATH, and PythonIO, a new program reasoning MC dataset constructed from HumanEval and MBPP. Experimental results indicate that LLMs' performance on these MC benchmarks leaves much room for improvement. Our data and code are available at https://github.com/Geralt-Targaryen/MC-Evaluation. △ Less

Submitted 26 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: data at https://github.com/Geralt-Targaryen/MC-Evaluation

arXiv:2405.11826 [pdf, other]

Data quality control system and long-term performance monitor of the LHAASO-KM2A

Authors: Zhen Cao, F. Aharonian, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, W. Bian, A. V. Bukevich, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, H. X. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. Chen , et al. (263 additional authors not shown)

Abstract: The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To… ▽ More The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively. △ Less

Submitted 13 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 15 pages, 9 figures

arXiv:2405.10739 [pdf, other]

Efficient Multimodal Large Language Models: A Survey

Authors: Yizhang **, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09787 [pdf, other]

Analysis of the BraTS 2023 Intracranial Meningioma Segmentation Challenge

Authors: Dominic LaBella, Ujjwal Baid, Omaditya Khanna, Shan McBurney-Lin, Ryan McLean, Pierre Nedelec, Arif Rashid, Nourel Hoda Tahon, Talissa Altes, Radhika Bhalerao, Yaseen Dhemesh, Devon Godfrey, Fathi Hilal, Scott Floyd, Anastasia Janas, Anahita Fathi Kazerooni, John Kirkpatrick, Collin Kent, Florian Kofler, Kevin Leu, Nazanin Maleki, Bjoern Menze, Maxence Pajot, Zachary J. Reitman, Jeffrey D. Rudie , et al. (96 additional authors not shown)

Abstract: We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning… ▽ More We describe the design and results from the BraTS 2023 Intracranial Meningioma Segmentation Challenge. The BraTS Meningioma Challenge differed from prior BraTS Glioma challenges in that it focused on meningiomas, which are typically benign extra-axial tumors with diverse radiologic and anatomical presentation and a propensity for multiplicity. Nine participating teams each developed deep-learning automated segmentation models using image data from the largest multi-institutional systematically expert annotated multilabel multi-sequence meningioma MRI dataset to date, which included 1000 training set cases, 141 validation set cases, and 283 hidden test set cases. Each case included T2, T2/FLAIR, T1, and T1Gd brain MRI sequences with associated tumor compartment labels delineating enhancing tumor, non-enhancing tumor, and surrounding non-enhancing T2/FLAIR hyperintensity. Participant automated segmentation models were evaluated and ranked based on a scoring system evaluating lesion-wise metrics including dice similarity coefficient (DSC) and 95% Hausdorff Distance. The top ranked team had a lesion-wise median dice similarity coefficient (DSC) of 0.976, 0.976, and 0.964 for enhancing tumor, tumor core, and whole tumor, respectively and a corresponding average DSC of 0.899, 0.904, and 0.871, respectively. These results serve as state-of-the-art benchmarks for future pre-operative meningioma automated segmentation algorithms. Additionally, we found that 1286 of 1424 cases (90.3%) had at least 1 compartment voxel abutting the edge of the skull-stripped image edge, which requires further investigation into optimal pre-processing face anonymization steps. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 16 pages, 11 tables, 10 figures, MICCAI

arXiv:2405.09569 [pdf, other]

GaitMotion: A Multitask Dataset for Pathological Gait Forecasting

Authors: Wenwen Zhang, Hao Zhang, Zenan Jiang, **g Wang, Amir Servati, Peyman Servati

Abstract: Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensor… ▽ More Gait benchmark empowers uncounted encouraging research fields such as gait recognition, humanoid locomotion, etc. Despite the growing focus on gait analysis, the research community is hindered by the limitations of the currently available databases, which mostly consist of videos or images with limited labeling. In this paper, we introduce GaitMotion, a multitask dataset leveraging wearable sensors to capture the patients' real-time movement with pathological gait. This dataset offers extensive ground-truth labeling for multiple tasks, including step/stride segmentation and step/stride length prediction, empowers researchers with a more holistic understanding of gait disturbances linked to neurological impairments. The wearable gait analysis suit captures the gait cycle, pattern, and parameters for both normal and pathological subjects. This data may prove beneficial for healthcare products focused on patient progress monitoring and post-disease recovery, as well as for forensics technologies aimed at person reidentification, and biomechanics research to aid in the development of humanoid robotics. Moreover, the analysis has considered the drift in data distribution across individual subjects. This drift can be attributed to each participant's unique behavioral habits or potential displacement of the sensor. Stride length variance for normal, Parkinson's, and stroke patients are compared to recognize the pathological walking pattern. As the baseline and benchmark, we provide an error of 14.1, 13.3, and 12.2 centimeters of stride length prediction for normal, Parkinson's, and Stroke gaits separately. We also analyzed the gait characteristics for normal and pathological gaits in terms of the gait cycle and gait parameters. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.07691 [pdf, other]

Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (255 additional authors not shown)

Abstract: The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i… ▽ More The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 11 pages, 5 figures

arXiv:2405.07411 [pdf, other]

MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks

Authors: Haijiang Tian, **gkun Yue, Xiaohong Liu, Guoxing Yang, Zeyu Jiang, Guangyu Wang

Abstract: Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). How… ▽ More Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). However, LP only considers the output after feature extraction. Yet, there exists a gap between input medical images and natural pretrained vision model. We introduce visual prompting (VP) to fill in the gap, and analyze the strategies of coupling between LP and VP. We design a joint learning loss function containing categorisation loss and discrepancy loss, which describe the variance of prompted and plain images, naming this joint training strategy MoVL (Mixture of Visual Prompting and Linear Probe). We experiment on 4 medical image classification datasets, with two mainstream architectures, ResNet and CLIP. Results shows that without changing the parameters and architecture of backbone model and with less parameters, there is potential for MoVL to achieve full finetune (FF) accuracy (on four medical datasets, average 90.91% for MoVL and 91.13% for FF). On out of distribution medical dataset, our method(90.33%) can outperform FF (85.15%) with absolute 5.18 % lead. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.07145 [pdf, other]

Stable Signature is Unstable: Removing Image Watermark from Diffusion Models

Authors: Yuepeng Hu, Zhengyuan Jiang, Moyang Guo, Neil Gong

Abstract: Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was cl… ▽ More Watermark has been widely deployed by industry to detect AI-generated images. A recent watermarking framework called \emph{Stable Signature} (proposed by Meta) roots watermark into the parameters of a diffusion model's decoder such that its generated images are inherently watermarked. Stable Signature makes it possible to watermark images generated by \emph{open-source} diffusion models and was claimed to be robust against removal attacks. In this work, we propose a new attack to remove the watermark from a diffusion model by fine-tuning it. Our results show that our attack can effectively remove the watermark from a diffusion model such that its generated images are non-watermarked, while maintaining the visual quality of the generated images. Our results highlight that Stable Signature is not as stable as previously thought. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06840 [pdf, other]

MEIC: Re-thinking RTL Debug Automation using LLMs

Authors: Ke Xu, Jialin Sun, Yuchen Hu, Xinwei Fang, Weiwei Shan, Xi Wang, Zhe Jiang

Abstract: The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces… ▽ More The deployment of Large Language Models (LLMs) for code debugging (e.g., C and Python) is widespread, benefiting from their ability to understand and interpret intricate concepts. However, in the semiconductor industry, utilising LLMs to debug Register Transfer Level (RTL) code is still insufficient, largely due to the underrepresentation of RTL-specific data in training sets. This work introduces a novel framework, Make Each Iteration Count (MEIC), which contrasts with traditional one-shot LLM-based debugging methods that heavily rely on prompt engineering, model tuning, and model training. MEIC utilises LLMs in an iterative process to overcome the limitation of LLMs in RTL code debugging, which is suitable for identifying and correcting both syntax and function errors, while effectively managing the uncertainties inherent in LLM operations. To evaluate our framework, we provide an open-source dataset comprising 178 common RTL programming errors. The experimental results demonstrate that the proposed debugging framework achieves fix rate of 93% for syntax errors and 78% for function errors, with up to 48x speedup in debugging processes when compared with experienced engineers. The Repo. of dataset and code: https://anonymous.4open.science/r/Verilog-Auto-Debug-6E7F/. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06713

Unveiling the Competitive Dynamics: A Comparative Evaluation of American and Chinese LLMs

Authors: Zhenhui Jiang, Jiaxin Li, Yang Liu

Abstract: The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of American and Chinese LLMs in both English and Chinese contexts. We proposed a comprehensive evaluation framework that encompasses natural… ▽ More The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of American and Chinese LLMs in both English and Chinese contexts. We proposed a comprehensive evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed 16 prominent models from the US and China under various operational tasks and scenarios. Our key findings show that GPT 4-Turbo is at the forefront in English contexts, whereas Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of American and Chinese LLMs point to the value of Sino-US collaboration in advancing LLM technology. The research presents the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development. Future work will expand on this framework to include emerging LLM multimodal capabilities and business application assessments. △ Less

Submitted 21 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: There was a miscommunication among the co-authors, resulting in the accidental submission of this paper to arXiv. We are in need of withdrawing the paper from your platform

arXiv:2405.06705 [pdf, other]

LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought

Authors: Zhuoxuan Jiang, Haoyuan Peng, Shanshan Feng, Fan Li, Dongsheng Li

Abstract: Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this… ▽ More Self-correction is emerging as a promising approach to mitigate the issue of hallucination in Large Language Models (LLMs). To facilitate effective self-correction, recent research has proposed mistake detection as its initial step. However, current literature suggests that LLMs often struggle with reliably identifying reasoning mistakes when using simplistic prompting strategies. To address this challenge, we introduce a unique prompting strategy, termed the Pedagogical Chain-of-Thought (PedCoT), which is specifically designed to guide the identification of reasoning mistakes, particularly mathematical reasoning mistakes. PedCoT consists of pedagogical principles for prompts (PPP) design, two-stage interaction process (TIP) and grounded PedCoT prompts, all inspired by the educational theory of the Bloom Cognitive Model (BCM). We evaluate our approach on two public datasets featuring math problems of varying difficulty levels. The experiments demonstrate that our zero-shot prompting strategy significantly outperforms strong baselines. The proposed method can achieve the goal of reliable mathematical mistake identification and provide a foundation for automatic math answer grading. The results underscore the significance of educational theory, serving as domain knowledge, in guiding prompting strategy design for addressing challenging tasks with LLMs effectively. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: To appear at IJCAI 2024

arXiv:2405.05945 [pdf, other]

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

Authors: Peng Gao, Le Zhuo, Dongyang Liu, Ruoyi Du, Xu Luo, Longtian Qiu, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, He Tong, **gwen He, Yu Qiao, Hongsheng Li

Abstract: Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified f… ▽ More Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we introduce the Lumina-T2X family - a series of Flow-based Large Diffusion Transformers (Flag-DiT) equipped with zero-initialized attention, as a unified framework designed to transform noise into images, videos, multi-view 3D objects, and audio clips conditioned on text instructions. By tokenizing the latent spatial-temporal space and incorporating learnable placeholders such as [nextline] and [nextframe] tokens, Lumina-T2X seamlessly unifies the representations of different modalities across various spatial-temporal resolutions. This unified approach enables training within a single framework for different modalities and allows for flexible generation of multimodal data at any resolution, aspect ratio, and length during inference. Advanced techniques like RoPE, RMSNorm, and flow matching enhance the stability, flexibility, and scalability of Flag-DiT, enabling models of Lumina-T2X to scale up to 7 billion parameters and extend the context window to 128K tokens. This is particularly beneficial for creating ultra-high-definition images with our Lumina-T2I model and long 720p videos with our Lumina-T2V model. Remarkably, Lumina-T2I, powered by a 5-billion-parameter Flag-DiT, requires only 35% of the training computational costs of a 600-million-parameter naive DiT. Our further comprehensive analysis underscores Lumina-T2X's preliminary capability in resolution extrapolation, high-resolution editing, generating consistent 3D views, and synthesizing videos with seamless transitions. We expect that the open-sourcing of Lumina-T2X will further foster creativity, transparency, and diversity in the generative AI community. △ Less

Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Technical Report; Code at: https://github.com/Alpha-VLLM/Lumina-T2X

arXiv:2405.04490 [pdf, other]

Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System

Authors: Zihao Jiang, Zefan Du, Shaolun Ruan, Juntao Chen, Yong Wang, Long Cheng, Rajkumar Buyya, Ying Mao

Abstract: Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to… ▽ More Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to break limits. Major companies like IBM, Google, and Microsoft provide access to noisy intermediate-scale quantum (NISQ) computers. Despite the theoretical promise of Shor's and Grover's algorithms, practical implementation on current quantum devices faces challenges, such as demanding additional resources and a high number of controlled operations. To tackle these challenges and optimize the utilization of limited onboard qubits, we introduce ReSaQuS, a resource-efficient index-value searching system within a quantum-classical hybrid framework. Building on Grover's algorithm, ReSaQuS employs an automatically managed iterative search approach. This method analyzes problem size, filters fewer probable data points, and progressively reduces the dataset with decreasing qubit requirements. Implemented using Qiskit and evaluated through extensive experiments, ReSaQuS has demonstrated a substantial reduction, up to 86.36\% in cumulative qubit consumption and 72.72\% in active periods, reinforcing its potential in optimizing quantum computing application deployment. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03654 [pdf, other]

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

Authors: Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zi**g Fan, Xiaodan Zhang, Zhengwei Jiang

Abstract: To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content securi… ▽ More To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts.This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures. We detail two implementations under this framework: "Obscure Intention" and "Create Ambiguity", which manipulate query complexity and ambiguity to evade malicious intent detection effectively. We empirically validate the effectiveness of the IntentObfuscator method across several models, including ChatGPT-3.5, ChatGPT-4, Qwen and Baichuan, achieving an average jailbreak success rate of 69.21\%. Notably, our tests on ChatGPT-3.5, which claims 100 million weekly active users, achieved a remarkable success rate of 83.65\%. We also extend our validation to diverse types of sensitive content like graphic violence, racism, sexism, political sensitivity, cybersecurity threats, and criminal skills, further proving the substantial impact of our findings on enhancing 'Red Team' strategies against LLM content security frameworks. △ Less

Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.03523 [pdf, other]

Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC

Authors: Phillippe Sauter, Thomas Benz, Paul Scheffler, Zerun Jiang, Beat Muheim, Frank K. Gürkaynak, Luca Benini

Abstract: We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling… ▽ More We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP's open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55 % core utilization compared to 50 % in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 2 pages, 1 figure, accepted as a poster at the RISC-V Summit Europe 2024

arXiv:2405.02713 [pdf, other]

Set Transformation: Trade-off Between Repair Bandwidth and Sub-packetization

Authors: Hao Shi, Zhengyi Jiang, Zhongyi Huang, Bo Bai, Gong Zhang, Hanxu Hou

Abstract: Maximum distance separable (MDS) codes facilitate the achievement of elevated levels of fault tolerance in storage systems while incurring minimal redundancy overhead. Reed-Solomon (RS) codes are typical MDS codes with the sub-packetization level being one, however, they require large repair bandwidth defined as the total amount of symbols downloaded from other surviving nodes during single-node f… ▽ More Maximum distance separable (MDS) codes facilitate the achievement of elevated levels of fault tolerance in storage systems while incurring minimal redundancy overhead. Reed-Solomon (RS) codes are typical MDS codes with the sub-packetization level being one, however, they require large repair bandwidth defined as the total amount of symbols downloaded from other surviving nodes during single-node failure/repair. In this paper, we present the {\em set transformation}, which can transform any MDS code into set transformed code such that (i) the sub-packetization level is flexible and ranges from 2 to $(n-k)^{\lfloor\frac{n}{n-k}\rfloor}$ in which $n$ is the number of nodes and $k$ is the number of data nodes, (ii) the new code is MDS code, (iii) the new code has lower repair bandwidth for any single-node failure. We show that our set transformed codes have both lower repair bandwidth and lower field size than the existing related MDS array codes, such as elastic transformed codes \cite{10228984}. Specifically, our set transformed codes have $2\%-6.6\%$ repair bandwidth reduction compared with elastic transformed codes \cite{10228984} for the evaluated typical parameters. △ Less

Submitted 4 May, 2024; originally announced May 2024.

arXiv:2405.02610 [pdf, other]

External magnetic field induced paramagnetic squeezing effect in heavy-ion collisions at the LHC

Authors: Ze-Fang Jiang, Zi-Han Zhang, Xue-Fei Yuan, Ben-Wei Zhang

Abstract: In non-central heavy-ion collisions, the quark-gluon plasma (QGP) encounters the most intense magnetic field ever produced in nature, with a strength of approximately 10$^{19\sim 20}$ Gauss. Recent lattice-QCD calculations reveal that the QGP exhibits paramagnetic properties at high temperatures. When an external strong magnetic field is applied, it generates an anisotropic squeezing force density… ▽ More In non-central heavy-ion collisions, the quark-gluon plasma (QGP) encounters the most intense magnetic field ever produced in nature, with a strength of approximately 10$^{19\sim 20}$ Gauss. Recent lattice-QCD calculations reveal that the QGP exhibits paramagnetic properties at high temperatures. When an external strong magnetic field is applied, it generates an anisotropic squeezing force density that competes with pressure gradients resulting from the purely QGP geometric expansion. In this study, we employ (3+1)-dimensional ideal hydrodynamics simulations to estimate the paramagnetic squeezing effect of this force density on the anisotropic expansion of QGP in non-central Pb+Pb collisions at the Large Hadron Collider (LHC). We consider both up-to-date magnetic susceptibility and various magnetic field profiles in this work. We find that the impact of rapidly decaying magnetic fields is insignificant, while enduring magnetic fields produce a strong force density that diminishes the momentum anisotropy of the QGP by up to 10\% at the intial stage, leaving a visible imprint on the elliptic flow $v_{2}$ of final charged particles. Our results provide insights into the interplay between magnetic fields and the dynamics of QGP expansion in non-central heavy-ion collisions. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 11 pages, 8 figures

arXiv:2405.01043 [pdf, ps, other]

Reed-Solomon Codes over Cyclic Polynomial Ring with Lower Encoding/Decoding Complexity

Authors: Wenhao Liu, Zhengyi Jiang, Zhongyi Huang, Linqi Song, Hanxu Hou

Abstract: Reed-Solomon (RS) codes are constructed over a finite field that have been widely employed in storage and communication systems. Many fast encoding/decoding algorithms such as fast Fourier transform (FFT) and modular approach are designed for RS codes to reduce the encoding/decoding complexity defined as the number of XORs involved in the encoding/decoding procedure. In this paper, we present the… ▽ More Reed-Solomon (RS) codes are constructed over a finite field that have been widely employed in storage and communication systems. Many fast encoding/decoding algorithms such as fast Fourier transform (FFT) and modular approach are designed for RS codes to reduce the encoding/decoding complexity defined as the number of XORs involved in the encoding/decoding procedure. In this paper, we present the construction of RS codes over the cyclic polynomial ring $ \mathbb{F}_2[x]/(1+x+\ldots+x^{p-1})$ and show that our codes are maximum distance separable (MDS) codes. Moreover, we propose the FFT and modular approach over the ring that can be employed in our codes for encoding/decoding complexity reduction. We show that our codes have 17.9\% encoding complexity reduction and 7.5\% decoding complexity reduction compared with RS codes over finite field, for $(n,k)=(2048,1984)$. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00992 [pdf, ps, other]

Assignment of charmed-strange $D_{s0}(2590)^+$ and $D_{sJ}(3040)^+$

Authors: Zi-Han Jiang, Ailin Zhang

Abstract: Based on analyses of the mass and the strong decay features, $D_{s0}(2590)^+$ observed by LHCb collaboration is identified as a radial excitation of the pseudoscalar $D_s$, and $D_{sJ}(3040)^+$ observed by BaBar collaboration is identified as a radial excitation of $D_{s1}(2536)^\pm$. $D_{s0}(2590)^+$ is possibly a pure $D_{s}(2~^1S_0)$ meson, both basic $D_{s1}(2536)^\pm$ and radially excited… ▽ More Based on analyses of the mass and the strong decay features, $D_{s0}(2590)^+$ observed by LHCb collaboration is identified as a radial excitation of the pseudoscalar $D_s$, and $D_{sJ}(3040)^+$ observed by BaBar collaboration is identified as a radial excitation of $D_{s1}(2536)^\pm$. $D_{s0}(2590)^+$ is possibly a pure $D_{s}(2~^1S_0)$ meson, both basic $D_{s1}(2536)^\pm$ and radially excited $D_{sJ}(3040)^+$ are possibly the mixtures $D_s(nP_1)$ between spin triplet $D_s(n~^3P_1)$ and spin singlet $D_s(n~^1P_1)$. In this arrangement, their masses meet the linear behavior of the radial Regge trajectory very well. In the $^3P_0$ strong decay model, the decay channels of $D_{s0}(2590)^+$ are $D^{*0}K^+$ and $D^{*+}K^0$, the total decay width is predicted with $Γ=76.12$ MeV. The main decay channels of $D_{sJ}(3040)^+$ are $D^{*0}K^+$/$D^{*+}K^0$ and $D^{*0}K^{*+}$/$D^{*+}K^{*0}$, the total decay width is predicted with $Γ=283.46$ MeV. These numerical strong decay results are consistent with the experiment data and support our arrangement. The dimensionless strength creation parameter $γ$ plays an important role in the calculation, and $γ=9.57$ is fixed through a comparison of the predicted strong decay widths of $D^*_{s2}(2573)$ and $D^*_{s3}(2860)^{\pm}$ with experimental data. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures

arXiv:2405.00900 [pdf, other]

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes

Authors: Shanlin Sun, Bingbing Zhuang, Ziyu Jiang, Buyu Liu, Xiaohui Xie, Manmohan Chandraker

Abstract: Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often… ▽ More Photorealistic simulation plays a crucial role in applications such as autonomous driving, where advances in neural radiance fields (NeRFs) may allow better scalability through the automatic creation of digital 3D assets. However, reconstruction quality suffers on street scenes due to largely collinear camera motions and sparser samplings at higher speeds. On the other hand, the application often demands rendering from camera views that deviate from the inputs to accurately simulate behaviors like lane changes. In this paper, we propose several insights that allow a better utilization of Lidar data to improve NeRF quality on street scenes. First, our framework learns a geometric scene representation from Lidar, which is fused with the implicit grid-based representation for radiance decoding, thereby supplying stronger geometric information offered by explicit point cloud. Second, we put forth a robust occlusion-aware depth supervision scheme, which allows utilizing densified Lidar points by accumulation. Third, we generate augmented training views from Lidar points for further improvement. Our insights translate to largely improved novel view synthesis under real driving scenes. △ Less

Submitted 4 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: CVPR2024 Highlights

arXiv:2405.00802 [pdf]

doi 10.1126/sciadv.adk8495

Sensing Spin Wave Excitations by Spin Defects in Few-Layer Thick Hexagonal Boron Nitride

Authors: **gcheng Zhou, Hanyi Lu, Di Chen, Mengqi Huang, Gerald Q. Yan, Faris Al-matouq, Jiu Chang, Dziga Djugba, Zhigang Jiang, Hailong Wang, Chunhui Rita Du

Abstract: Optically active spin defects in wide band-gap semiconductors serve as a local sensor of multiple degrees of freedom in a variety of "hard" and "soft" condensed matter systems. Taking advantage of the recent progress on quantum sensing using van der Waals (vdW) quantum materials, here we report direct measurements of spin waves excited in magnetic insulator Y3Fe5O12 (YIG) by boron vacancy $V_B^-$… ▽ More Optically active spin defects in wide band-gap semiconductors serve as a local sensor of multiple degrees of freedom in a variety of "hard" and "soft" condensed matter systems. Taking advantage of the recent progress on quantum sensing using van der Waals (vdW) quantum materials, here we report direct measurements of spin waves excited in magnetic insulator Y3Fe5O12 (YIG) by boron vacancy $V_B^-$ spin defects contained in few-layer thick hexagonal boron nitride nanoflakes. We show that the ferromagnetic resonance and parametric spin excitations can be effectively detected by $V_B^-$ spin defects under various experimental conditions through optically detected magnetic resonance measurements. The off-resonant dipole interaction between YIG magnons and $V_B^-$ spin defects is mediated by multi-magnon scattering processes, which may find relevant applications in a range of emerging quantum sensing, computing, and metrology technologies. Our results also highlight the opportunities offered by quantum spin defects in layered two-dimensional vdW materials for investigating local spin dynamic behaviors in magnetic solid-state matters. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.00676 [pdf, other]

Spectrally Pruned Gaussian Fields with Neural Compensation

Authors: Runyi Yang, Zhenxin Zhu, Zhou Jiang, Baijun Ye, Xiaoxue Chen, Yifei Zhang, Yuantao Chen, Jian Zhao, Hao Zhao

Abstract: Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between pri… ▽ More Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset. Codes are publicly available at https://runyiyang.github.io/projects/SUNDAE/. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Code: https://github.com/RunyiYang/SUNDAE Project page: https://runyiyang.github.io/projects/SUNDAE/

arXiv:2404.18192 [pdf, other]

Block-Map-Based Localization in Large-Scale Environment

Authors: Yixiao Feng, Zhou Jiang, Yongliang Shi, Yunlong Feng, Xiangyu Chen, Hao Zhao, Guyue Zhou

Abstract: Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational l… ▽ More Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational load caused by maintaining large-scale maps. Firstly, we introduce a method for generating block maps and the corresponding switching strategies, ensuring that the robot can estimate the state in large-scale environments by loading local map information. Secondly, global localization according to Branch-and-Bound Search (BBS) in the 3D map is introduced to provide the initial pose. Finally, a graph-based optimization method is adopted with a dynamic sliding window that determines what factors are being marginalized whether a robot is exposed to a BM or switching to another one, which maintains the accuracy and efficiency of pose tracking. Comparison experiments are performed on publicly available large-scale datasets. Results show that the proposed method can track the robot pose even though the map scale reaches more than 6 kilometers, while efficient and accurate localization is still guaranteed on NCLT and M2DGR. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 4 tables, published to ICRA 2024

arXiv:2404.17917 [pdf, other]

EvaNet: Elevation-Guided Flood Extent Map** on Earth Imagery

Authors: Mirza Tanzim Sami, Da Yan, Saugat Adhikari, Lyuheng Yuan, Jiao Han, Zhe Jiang, Jalal Khalil, Yang Zhou

Abstract: Accurate and timely map** of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectr… ▽ More Accurate and timely map** of flood extent from high-resolution satellite imagery plays a crucial role in disaster management such as damage assessment and relief activities. However, current state-of-the-art solutions are based on U-Net, which can-not segment the flood pixels accurately due to the ambiguous pixels (e.g., tree canopies, clouds) that prevent a direct judgement from only the spectral features. Thanks to the digital elevation model (DEM) data readily available from sources such as United States Geological Survey (USGS), this work explores the use of an elevation map to improve flood extent map**. We propose, EvaNet, an elevation-guided segmentation model based on the encoder-decoder architecture with two novel techniques: (1) a loss function encoding the physical law of gravity that if a location is flooded (resp. dry), then its adjacent locations with a lower (resp. higher) elevation must also be flooded (resp. dry); (2) a new (de)convolution operation that integrates the elevation map by a location sensitive gating mechanism to regulate how much spectral features flow through adjacent layers. Extensive experiments show that EvaNet significantly outperforms the U-Net baselines, and works as a perfect drop-in replacement for U-Net in existing solutions to flood extent map**. △ Less

Submitted 12 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted at the International Joint Conference on Artificial Intelligence (IJCAI, 2024)

arXiv:2404.15009 [pdf, other]

The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we present the CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge, focused on pediatric brain tumors with data acquired across multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. The CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge brings together clinicians and AI/imaging scientists to lead to faster development of automated segmentation techniques that could benefit clinical trials, and ultimately the care of children with brain tumors. △ Less

Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033

arXiv:2404.14037 [pdf, other]

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

Abstract: Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method… ▽ More Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip motion generation. Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose, delivering stable and realistic rendered videos. Extensive experimental results suggest that GaussianTalker outperforms existing state-of-the-art methods in talking head synthesis, delivering precise lip synchronization and exceptional visual quality. Our method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU, significantly exceeding the threshold for real-time rendering performance, and can potentially be deployed on other hardware platforms. △ Less

Submitted 28 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: https://yuhongyun777.github.io/GaussianTalker/

arXiv:2404.13786 [pdf, other]

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, **gfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13564 [pdf, other]

Masked Latent Transformer with the Random Masking Ratio to Advance the Diagnosis of Dental Fluorosis

Authors: Yun Wu, Hao Xu, Maohua Gu, Zhongchuan Jiang, Jun Xu, Youliang Tian

Abstract: Dental fluorosis is a chronic disease caused by long-term overconsumption of fluoride, which leads to changes in the appearance of tooth enamel. It is an important basis for early non-invasive diagnosis of endemic fluorosis. However, even dental professionals may not be able to accurately distinguish dental fluorosis and its severity based on tooth images. Currently, there is still a gap in resear… ▽ More Dental fluorosis is a chronic disease caused by long-term overconsumption of fluoride, which leads to changes in the appearance of tooth enamel. It is an important basis for early non-invasive diagnosis of endemic fluorosis. However, even dental professionals may not be able to accurately distinguish dental fluorosis and its severity based on tooth images. Currently, there is still a gap in research on applying deep learning to diagnosing dental fluorosis. Therefore, we construct the first open-source dental fluorosis image dataset (DFID), laying the foundation for deep learning research in this field. To advance the diagnosis of dental fluorosis, we propose a pioneering deep learning model called masked latent transformer with the random masking ratio (MLTrMR). MLTrMR introduces a mask latent modeling scheme based on Vision Transformer to enhance contextual learning of dental fluorosis lesion characteristics. Consisting of a latent embedder, encoder, and decoder, MLTrMR employs the latent embedder to extract latent tokens from the original image, whereas the encoder and decoder comprising the latent transformer (LT) block are used to process unmasked tokens and predict masked tokens, respectively. To mitigate the lack of inductive bias in Vision Transformer, which may result in performance degradation, the LT block introduces latent tokens to enhance the learning capacity of latent lesion features. Furthermore, we design an auxiliary loss function to constrain the parameter update direction of the model. MLTrMR achieves 80.19% accuracy, 75.79% F1, and 81.28% quadratic weighted kappa on DFID, making it state-of-the-art (SOTA). △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13136 [pdf, ps, other]

Beyond the classification theorem of Cameron, Goethals, Seidel, and Shult

Authors: Hricha Acharya, Zilin Jiang

Abstract: In 1976, Cameron, Goethals, Seidel, and Shult classified all the graphs with smallest eigenvalue at least $-2$ by relating such graphs to root systems that occur in the classification of semisimple Lie algebras. In this paper, extending their beautiful theorem, we give a complete classification of all connected graphs with smallest eigenvalue in $(-λ^*, -2)$, where… ▽ More In 1976, Cameron, Goethals, Seidel, and Shult classified all the graphs with smallest eigenvalue at least $-2$ by relating such graphs to root systems that occur in the classification of semisimple Lie algebras. In this paper, extending their beautiful theorem, we give a complete classification of all connected graphs with smallest eigenvalue in $(-λ^*, -2)$, where $λ^* = ρ^{1/2} + ρ^{-1/2} \approx 2.01980$, and $ρ$ is the unique real root of $x^3 = x + 1$. Our result is the first classification of infinitely many connected graphs with their smallest eigenvalue in $(-λ, -2)$ for any constant $λ> 2$. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 28 pages, 11 figures

MSC Class: 05C50; 05C76; 05C30

arXiv:2404.12730 [pdf, other]

PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy

Authors: Zepeng Jiang, Weiwei Ni, Yifan Zhang

Abstract: Conditional Generative Adversarial Networks (CGANs) exhibit significant potential in supervised learning model training by virtue of their ability to generate realistic labeled images. However, numerous studies have indicated the privacy leakage risk in CGANs models. The solution DPCGAN, incorporating the differential privacy framework, faces challenges such as heavy reliance on labeled data for m… ▽ More Conditional Generative Adversarial Networks (CGANs) exhibit significant potential in supervised learning model training by virtue of their ability to generate realistic labeled images. However, numerous studies have indicated the privacy leakage risk in CGANs models. The solution DPCGAN, incorporating the differential privacy framework, faces challenges such as heavy reliance on labeled data for model training and potential disruptions to original gradient information due to excessive gradient clip**, making it difficult to ensure model accuracy. To address these challenges, we present a privacy-preserving training framework called PATE-TripleGAN. This framework incorporates a classifier to pre-classify unlabeled data, establishing a three-party min-max game to reduce dependence on labeled data. Furthermore, we present a hybrid gradient desensitization algorithm based on the Private Aggregation of Teacher Ensembles (PATE) framework and Differential Private Stochastic Gradient Descent (DPSGD) method. This algorithm allows the model to retain gradient information more effectively while ensuring privacy protection, thereby enhancing the model's utility. Privacy analysis and extensive experiments affirm that the PATE-TripleGAN model can generate a higher quality labeled image dataset while ensuring the privacy of the training data. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.12550 [pdf, other]

Characterizing Coherent Errors using Matrix-Element Amplification

Authors: Jonathan A. Gross, Elie Genois, Dripto M. Debroy, Yaxing Zhang, Wojciech Mruczkiewicz, Ze-Pei Cian, Zhang Jiang

Abstract: Repeating a gate sequence multiple times amplifies systematic errors coherently, making it a useful tool for characterizing quantum gates. However, the precision of such an approach is limited by low-frequency noises, while its efficiency hindered by time-consuming scans required to match up the phases of the off-diagonal matrix elements being amplified. Here, we overcome both challenges by interl… ▽ More Repeating a gate sequence multiple times amplifies systematic errors coherently, making it a useful tool for characterizing quantum gates. However, the precision of such an approach is limited by low-frequency noises, while its efficiency hindered by time-consuming scans required to match up the phases of the off-diagonal matrix elements being amplified. Here, we overcome both challenges by interleaving the gate of interest with dynamical decoupling sequences in a protocol we call Matrix-Element Amplification using Dynamical Decoupling (MEADD). Using frequency-tunable superconducting qubits from a Google Sycamore quantum processor, we experimentally demonstrate that MEADD surpasses the accuracy and precision of existing characterization protocols for estimating systematic errors in single- and two-qubit gates. In particular, MEADD yields factors of 5 to 10 improvements in estimating coherent parameters of the $\mathrm{CZ}$ gates compared to existing methods, reaching a precision below one milliradian. We also use it to characterize coherent crosstalk in the processor which was previously too small to detect reliably. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12154 [pdf, other]

StyleBooth: Image Style Editing with Multimodal Instruction

Authors: Zhen Han, Chaojie Mao, Zeyinzi Jiang, Yulin Pan, **gfeng Zhang

Abstract: Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method th… ▽ More Given an original image, image editing aims to generate an image that align with the provided instruction. The challenges are to accept multimodal inputs as instructions and a scarcity of high-quality training data, including crucial triplets of source/target image pairs and multimodal (text and image) instructions. In this paper, we focus on image style editing and present StyleBooth, a method that proposes a comprehensive framework for image editing and a feasible strategy for building a high-quality style editing dataset. We integrate encoded textual instruction and image exemplar as a unified condition for diffusion model, enabling the editing of original image following multimodal instructions. Furthermore, by iterative style-destyle tuning and editing and usability filtering, the StyleBooth dataset provides content-consistent stylized/plain image pairs in various categories of styles. To show the flexibility of StyleBooth, we conduct experiments on diverse tasks, such as text-based style editing, exemplar-based style editing and compositional style editing. The results demonstrate that the quality and variety of training data significantly enhance the ability to preserve content and improve the overall quality of generated images in editing tasks. Project page can be found at https://ali-vilab.github.io/stylebooth-page/. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11044 [pdf, other]

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

Authors: Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang

Abstract: The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access latencies are significantly longer and more variable than local DRAM. For applications to achieve acceptable performance on far memory, a high degre… ▽ More The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access latencies are significantly longer and more variable than local DRAM. For applications to achieve acceptable performance on far memory, a high degree of memory-level parallelism (MLP) is needed to tolerate the long access latency. While modern out-of-order processors are capable of exploiting a certain degree of MLP, they are constrained by resource limitations and hardware complexity. The key obstacle is the synchronous memory access semantics of traditional load/store instructions, which occupy critical hardware resources for a long time. The longer far memory latencies exacerbate this limitation. This paper proposes a set of Asynchronous Memory Access Instructions (AMI) and its supporting function unit, Asynchronous Memory Access Unit (AMU), inside a contemporary Out-of-Order Core. AMI separates memory request issuing from response handling to reduce resource occupation. Additionally, AMU architecture supports up to several hundreds of asynchronous memory requests through re-purposing a portion of L2 Cache as scratchpad memory (SPM) to provide sufficient temporal storage. Together with a coroutine-based programming framework, this scheme can achieve significantly higher MLP for hiding far memory latencies. Evaluation with a cycle-accurate simulation shows AMI achieves 2.42x speedup on average for memory-bound benchmarks with 1us additional far memory latency. Over 130 outstanding requests are supported with 26.86x speedup for GUPS (random access) with 5 us latency. These demonstrate how the techniques tackle far memory performance impacts through explicit MLP expression and latency adaptation. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09927 [pdf, other]

Autonomous Path Planning for Intercostal Robotic Ultrasound Imaging Using Reinforcement Learning

Authors: Yuan Bi, Cheng Qian, Zhicheng Zhang, Nassir Navab, Zhongliang Jiang

Abstract: Ultrasound (US) has been widely used in daily clinical practice for screening internal organs and guiding interventions. However, due to the acoustic shadow cast by the subcutaneous rib cage, the US examination for thoracic application is still challenging. To fully cover and reconstruct the region of interest in US for diagnosis, an intercostal scanning path is necessary. To tackle this challenge… ▽ More Ultrasound (US) has been widely used in daily clinical practice for screening internal organs and guiding interventions. However, due to the acoustic shadow cast by the subcutaneous rib cage, the US examination for thoracic application is still challenging. To fully cover and reconstruct the region of interest in US for diagnosis, an intercostal scanning path is necessary. To tackle this challenge, we present a reinforcement learning (RL) approach for planning scanning paths between ribs to monitor changes in lesions on internal organs, such as the liver and heart, which are covered by rib cages. Structured anatomical information of the human skeleton is crucial for planning these intercostal paths. To obtain such anatomical insight, an RL agent is trained in a virtual environment constructed using computational tomography (CT) templates with randomly initialized tumors of various shapes and locations. In addition, task-specific state representation and reward functions are introduced to ensure the convergence of the training process while minimizing the effects of acoustic attenuation and shadows during scanning. To validate the effectiveness of the proposed approach, experiments have been carried out on unseen CTs with randomly defined single or multiple scanning targets. The results demonstrate the efficiency of the proposed RL framework in planning non-shadowed US scanning trajectories in areas with limited acoustic access. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08079 [pdf, other]

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Authors: Nastaran Saadati, Minh Pham, Nasla Saleem, Joshua R. Waite, Aditya Balu, Zhanhong Jiang, Chinmay Hegde, Soumik Sarkar

Abstract: Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue,… ▽ More Recent advances in decentralized deep learning algorithms have demonstrated cutting-edge performance on various tasks with large pre-trained models. However, a pivotal prerequisite for achieving this level of competitiveness is the significant communication and computation overheads when updating these models, which prohibits the applications of them to real-world scenarios. To address this issue, drawing inspiration from advanced model merging techniques without requiring additional training, we introduce the Decentralized Iterative Merging-And-Training (DIMAT) paradigm--a novel decentralized deep learning framework. Within DIMAT, each agent is trained on their local data and periodically merged with their neighboring agents using advanced model merging techniques like activation matching until convergence is achieved. DIMAT provably converges with the best available rate for nonconvex functions with various first-order methods, while yielding tighter error bounds compared to the popular existing approaches. We conduct a comprehensive empirical analysis to validate DIMAT's superiority over baselines across diverse computer vision tasks sourced from multiple datasets. Empirical results validate our theoretical claims by showing that DIMAT attains faster and higher initial gain in accuracy with independent and identically distributed (IID) and non-IID data, incurring lower communication overhead. This DIMAT paradigm presents a new opportunity for the future decentralized learning, enhancing its adaptability to real-world with sparse and light-weight communication and computation. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: CVPR 2024 accepted paper, 22 pages, 12 figures

arXiv:2404.04863 [pdf]

Microscopic Insights into Fatigue Mechanism in Wurtzite Ferroelectric Al$_{0.65}$Sc$_{0.35}$N: Oxygen Infiltration Enabled Grain Amorphization Spanning Boundary to Bulk

Authors: Ruiqing Wang, Danyang Yao, Jiuren Zhou, Yang Li, Zhi Jiang, Dongliang Chen, Xu Ran, Yu Gao, Zixuan Cheng, Yong Wang, Yan Liu, Yue Hao, Genquan Han

Abstract: For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation… ▽ More For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation at top electrode edges and grain boundaries accompanied with an increase in current leakage and the disappearance of ferroelectric properties, was observed in nanoscale after long-term field cycling. This observation indicates the emergence of non-ferroelectric and even amorphous states. This presented work revealed solid experimental evidence of an oxygen-involved fatigue mechanism, providing valuable insights into the physical nature of the ferroelectric properties of AlScN films. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 2 Pages,7 figures

arXiv:2404.04801 [pdf, ps, other]

doi 10.1007/s41605-024-00467-8

LHAASO-KM2A detector simulation using Geant4

Authors: Zhen Cao, F. Aharonian, Q. An, Axikegu, Y. X. Bai, Y. W. Bao, D. Bastieri, X. J. Bi, Y. J. Bi, J. T. Cai, Q. Cao, W. Y. Cao, Zhe Cao, J. Chang, J. F. Chang, A. M. Chen, E. S. Chen, Liang Chen, Lin Chen, Long Chen, M. J. Chen, M. L. Chen, Q. H. Chen, S. H. Chen, S. Z. Chen , et al. (254 additional authors not shown)

Abstract: KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with… ▽ More KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.04254 [pdf, other]

Watermark-based Detection and Attribution of AI-Generated Content

Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong

Abstract: Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is la… ▽ More Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is largely unexplored. In this work, we aim to bridge this gap by providing the first systematic study on watermark-based, user-aware detection and attribution of AI-generated content. Specifically, we theoretically study the detection and attribution performance via rigorous probabilistic analysis. Moreover, we develop an efficient algorithm to select watermarks for the users to enhance attribution performance. Both our theoretical and empirical results show that watermark-based detection and attribution inherit the accuracy and (non-)robustness properties of the watermarking method. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01430 [pdf, other]

Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs

Authors: Zheng Zhang, Fan Yang, Ziyan Jiang, Zheng Chen, Zhengyang Zhao, Chengyuan Ma, Liang Zhao, Yang Liu

Abstract: Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within t… ▽ More Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. This development is particularly crucial for tasks that involve retrieving knowledge from an external datastore, which can result in long inputs. However, recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information within the input sequence. In this study, we conduct extensive experiments to investigate the root causes of positional bias. Our findings indicate that the primary contributor to LLM positional bias stems from the inherent positional preferences of different models. We demonstrate that merely employing prompt-based solutions is inadequate for overcoming the positional preferences. To address this positional bias issue of a pre-trained LLM, we developed a Position-Aware Parameter Efficient Fine-Tuning (PAPEFT) approach which is composed of a data augmentation technique and a parameter efficient adapter, enhancing a uniform attention distribution across the input context. Our experiments demonstrate that the proposed approach effectively reduces positional bias, improving LLMs' effectiveness in handling long context sequences for various tasks that require externally retrieved knowledge. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2404.00327 [pdf, other]

YNetr: Dual-Encoder architecture on Plain Scan Liver Tumors (PSLT)

Authors: Wen Sheng, Zhong Zheng, Jiajun Liu, Han Lu, Hanyuan Zhang, Zhengyong Jiang, Zhihong Zhang, Dao** Zhu

Abstract: Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation d… ▽ More Background: Liver tumors are abnormal growths in the liver that can be either benign or malignant, with liver cancer being a significant health concern worldwide. However, there is no dataset for plain scan segmentation of liver tumors, nor any related algorithms. To fill this gap, we propose Plain Scan Liver Tumors(PSLT) and YNetr. Methods: A collection of 40 liver tumor plain scan segmentation datasets was assembled and annotated. Concurrently, we utilized Dice coefficient as the metric for assessing the segmentation outcomes produced by YNetr, having advantage of capturing different frequency information. Results: The YNetr model achieved a Dice coefficient of 62.63% on the PSLT dataset, surpassing the other publicly available model by an accuracy margin of 1.22%. Comparative evaluations were conducted against a range of models including UNet 3+, XNet, UNetr, Swin UNetr, Trans-BTS, COTr, nnUNetv2 (2D), nnUNetv2 (3D fullres), MedNext (2D) and MedNext(3D fullres). Conclusions: We not only proposed a dataset named PSLT(Plain Scan Liver Tumors), but also explored a structure called YNetr that utilizes wavelet transform to extract different frequency information, which having the SOTA in PSLT by experiments. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.00051 [pdf, other]

Deja vu: Contrastive Historical Modeling with Prefix-tuning for Temporal Knowledge Graph Reasoning

Authors: Miao Peng, Ben Liu, Wenjie Xu, Zihao Jiang, Jiahui Zhu, Min Peng

Abstract: Temporal Knowledge Graph Reasoning (TKGR) is the task of inferring missing facts for incomplete TKGs in complex scenarios (e.g., transductive and inductive settings), which has been gaining increasing attention. Recently, to mitigate dependence on structured connections in TKGs, text-based methods have been developed to utilize rich linguistic information from entity descriptions. However, sufferi… ▽ More Temporal Knowledge Graph Reasoning (TKGR) is the task of inferring missing facts for incomplete TKGs in complex scenarios (e.g., transductive and inductive settings), which has been gaining increasing attention. Recently, to mitigate dependence on structured connections in TKGs, text-based methods have been developed to utilize rich linguistic information from entity descriptions. However, suffering from the enormous parameters and inflexibility of pre-trained language models, existing text-based methods struggle to balance the textual knowledge and temporal information with computationally expensive purpose-built training strategies. To tap the potential of text-based models for TKGR in various complex scenarios, we propose ChapTER, a Contrastive historical modeling framework with prefix-tuning for TEmporal Reasoning. ChapTER feeds history-contextualized text into the pseudo-Siamese encoders to strike a textual-temporal balance via contrastive estimation between queries and candidates. By introducing virtual time prefix tokens, it applies a prefix-based tuning method to facilitate the frozen PLM capable for TKGR tasks under different settings. We evaluate ChapTER on four transductive and three few-shot inductive TKGR benchmarks, and experimental results demonstrate that ChapTER achieves superior performance compared to competitive baselines with only 0.17% tuned parameters. We conduct thorough analysis to verify the effectiveness, flexibility and efficiency of ChapTER. △ Less

Submitted 25 March, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2403.19534 [pdf, other]

Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance

Authors: Yulin Pan, Chaojie Mao, Zeyinzi Jiang, Zhen Han, **gfeng Zhang

Abstract: Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects.… ▽ More Prior studies have made significant progress in image inpainting guided by either text or subject image. However, the research on editing with their combined guidance is still in the early stages. To tackle this challenge, we present LAR-Gen, a novel approach for image inpainting that enables seamless inpainting of masked scene images, incorporating both the textual prompts and specified subjects. Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence. The process involves (i) Locate: concatenating the noise with masked scene image to achieve precise regional editing, (ii) Assign: employing decoupled cross-attention mechanism to accommodate multi-modal guidance, and (iii) Refine: using a novel RefineNet to supplement subject details. Additionally, to address the issue of scarce training data, we introduce a novel data construction pipeline. This pipeline extracts substantial pairs of data consisting of local text prompts and corresponding visual instances from a vast image dataset, leveraging publicly available large models. Extensive experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency. Project page can be found at \url{https://ali-vilab.github.io/largen-page/}. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 22 pages, 14 figures

arXiv:2403.17987 [pdf]

doi 10.1103/PhysRevB.109.165414

Electro-optic properties from ab initio calculations in two-dimensional materials

Authors: Zhijun Jiang, Hongjun Xiang, Laurent Bellaiche, Charles Paillard

Abstract: Electro-optic (EO) effects relate the change of optical constants by low-frequency electric fields. Thanks to the advent of Density Functional Perturbation Theory (DFPT), the EO properties of bulk three-dimensional (3D) materials can now be calculated in an ab initio way. However, the use of periodic boundary conditions in most Density Functional Theory codes imposes to simulate two-dimensional (2… ▽ More Electro-optic (EO) effects relate the change of optical constants by low-frequency electric fields. Thanks to the advent of Density Functional Perturbation Theory (DFPT), the EO properties of bulk three-dimensional (3D) materials can now be calculated in an ab initio way. However, the use of periodic boundary conditions in most Density Functional Theory codes imposes to simulate two-dimensional (2D) materials using slabs surrounded by a large layer of vacuum. The EO coefficients predicted from such calculations, if not rescaled properly, can severely deviate from the real EO properties of 2D materials. The present work discusses the issue and introduces the rescaling relationships allowing to recover the true EO properties. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 18 pages, 4 figures

Journal ref: Phys. Rev. B 109, 165414 (2024)

arXiv:2403.16751 [pdf]

Recent advances on CO2-assisted synthesis of metal nanoparticles for the upgrading of biomass-derived compounds

Authors: Zhiwei Jiang, Yongjian Zeng, Ruichao Guo, Lu Lin, Rafael Luque, Kai Yan

Abstract: Nanostructured catalysts have attracted the increased attention for biomass conversion into high-valued chemicals due to the rapid depletion of fossil resources and increasingly severe environmental issues. Supercritical carbon dioxide (scCO2) fluid is an attractive medium for synthesizing nanostructured materials due to its favorable properties. In this review, the properties of scCO2 and the rol… ▽ More Nanostructured catalysts have attracted the increased attention for biomass conversion into high-valued chemicals due to the rapid depletion of fossil resources and increasingly severe environmental issues. Supercritical carbon dioxide (scCO2) fluid is an attractive medium for synthesizing nanostructured materials due to its favorable properties. In this review, the properties of scCO2 and the roles of scCO2 in the fabrication of metal nanoparticles were assessed in detailed. A general overview of the synthesis of different types of metal nanoparticles (including metal oxide nanoparticles) using scCO2 and the relationship between the structure of the obtained metal nanoparticles and the preparation conditions such as reaction temperature and pressure, types of metal precursors, and deposition time are system summarized and compared in tables. Besides, compared to the meatal catalysts using the conventional methods, the catalysts obtained using scCO2 exhibited excellent catalytic performance on biomass conversion reactions, mainly focused on oxidation, hydrogenation reactions. Finally, opportunities and challenges of metal nanoparticle preparation using scCO2 for biomass valorization to chemicals and liquid fuels are highlighted. This review could be helpful for the rational design of more efficient metal catalysts for the selective synthesis of fine chemicals and fuels from biomass-derived chemicals. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 66 pages, 21 figures

arXiv:2403.16733 [pdf]

doi 10.1016/j.apsusc.2022.154997

Direct activation of PMS by highly dispersed amorphous CoOx clusters in anatase TiO2 nanosheets for efficient oxidation of biomass-derived alcohols

Authors: Zhiwei Jiang, Zhiyue Zhao, Xin Li, Huaiguang Li, Hector F. Garces, Mahmoud Amer, Kai Yan

Abstract: Develo** a green and cost-effective catalytic system for the selective oxidation of biomass-derived alcohols is vital for the sustainable synthesis of fine chemicals. Herein, highly dispersed subnanometric amorphous CoOx clusters in anatase TiO2 nanosheets (Co-TiO2) fabricated by green solvent CO2 assisted approach could directly activate peroxymonosulfate (PMS) for the highly selective oxidatio… ▽ More Develo** a green and cost-effective catalytic system for the selective oxidation of biomass-derived alcohols is vital for the sustainable synthesis of fine chemicals. Herein, highly dispersed subnanometric amorphous CoOx clusters in anatase TiO2 nanosheets (Co-TiO2) fabricated by green solvent CO2 assisted approach could directly activate peroxymonosulfate (PMS) for the highly selective oxidation of various biomass-derived alcohols. Advanced characterizations (e.g., EXAFS, EPR, AC HAADF-STEM) reveal that a strong interaction of CoOx clusters and the anatase TiO2 support exist in Co-TiO2 and Co atom in Co-TiO2 is mainly consisted of Co2+ and Co3+. The Co-TiO2 catalyst offers superior catalytic performance in the conversion of six types of alcohols (e.g., benzyl alcohol (BAL), 5-hydroxymethylfurfural (HMF)) with high selectivity to produce corresponding aldehydes. Highly dispersed CoOx clusters and the interaction between CoOx clusters and TiO2 support contribute to the superior performance. Mechanism studies show that SO4 radicals play the dominant role in the selective oxidation of model reactant BAL and 1O2 participates in the non-radical pathway. DFT calculations are well matched with experiment and decipher that the strong interaction between CoOx clusters and TiO2 support promotes the formation of SO4 and SO5. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 29 pages, 9 figures

Journal ref: Applied Surface Science 2023

arXiv:2403.16708 [pdf]

doi 10.1016/S1872-2067(23)64418-3

Facile synthesis of CoSi alloy with rich vacancy for base- and solvent-free aerobic oxidation of aromatic alcohols

Authors: Zhiyue Zhao, Zhiwei Jiang, Yizhe Huang, Mebrouka Boubeche, Valentina G. Matveeva, Hector F. Garces, Huixia Luo, Kai Yan

Abstract: Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption… ▽ More Rational design and green synthesis of low-cost and robust catalysts efficient for the selective oxidation of various alcohols are full of challenges. Herein, we report a fast and solvent-free arc-melting (AM) method to controllably synthesize semimetal CoSi alloy (abbreviated as AM-CoSi) that is efficient for the base- and solvent-free oxidation of six types of aromatic alcohols. X-ray absorption fine structure (XAFS), electron paramagnetic resonance (EPR), and aberration corrected high angle annular dark field scanning transmission electron microscope (AC HAADF-STEM) confirmed the successful synthesis of AM-CoSi with rich Si vacancy (Siv). The as-prepared CoSi alloy catalysts exhibit an order of magnitude activity enhancement in the oxidation of model reactant benzyl alcohol (BAL) to benzyl benzoate (BBE) compared with its mono counterparts, whereas 70 % yield of BBE which is the highest yield to date. Experimental results and DFT calculations well verify that the CoSi alloy structure improves the BAL conversion and Si vacancy mainly contributes to the generation of BBE. After that, CoSi alloy maintains high stability and a potential pathway is rationally proposed. Besides, CoSi alloy also efficiently works for the selective oxidation of various alcohols with different groups. This work demonstrates for the first time that semimetal CoSi alloy is robust for the green oxidation of various alcohols and provides a vast opportunity for reasonable design and application of other semimetal alloy catalysts. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 22 pages, 6 figures,

Journal ref: Chinese Journal of Catalysis 2023

arXiv:2403.16433 [pdf]

doi 10.1016/j.cattod.2023.114252

Highly dispersed Ru nanoparticles anchored on NiAl layered double oxides catalyst for selective hydrodeoxygenation of vanillin

Authors: Yongjian Zeng, Lu Lin, Di Hu, Zhiwei Jiang, Shaimaa Saeed, Ruichao Guo, Ibrahim Ashour, Kai Yan

Abstract: The hydrodeoxygenation (HDO) of lignin-derived feedstocks into value-added chemicals with high efficiency and selectivity is desirable for the utilization of biomass resource. The complex oxygen-containing groups of lignin-derived substance result in the challenge of the low selectivity toward the required product. In this work, highly dispersed Ru nanoparticles anchored on Ni3Al1 layered double o… ▽ More The hydrodeoxygenation (HDO) of lignin-derived feedstocks into value-added chemicals with high efficiency and selectivity is desirable for the utilization of biomass resource. The complex oxygen-containing groups of lignin-derived substance result in the challenge of the low selectivity toward the required product. In this work, highly dispersed Ru nanoparticles anchored on Ni3Al1 layered double oxides (LDOs) catalyst derived from NiAl layered double hydroxides (LDHs) with flower-shaped morphology was constructed by a simple deposition-reduction method. The introduction of LDHs-derived support can significantly impact the catalytic activity for the HDO of lignin-derived vanillin (VL) into 2-methoxy-4-methylphenol (MMP). The Ru/Ni3Al1-400 catalyst obtained complete conversion of VL and 94.2% yield of MMP at 130 °C in methanol solvent, much better than the catalysts without LDHs-derived support. The methanol solvent is beneficial for the conversion of reaction intermediate of vanillin alcohol (VA). Detailed characterization reveals that the existence of the enhanced metal-support interaction over Ru/Ni3Al1-400 and the easily accessible acid sites facilitate the production of MMP. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Showing 51–100 of 1,559 results for author: Jiang, Z