Search | arXiv e-print repository

Enhancing Medical Imaging with GANs Synthesizing Realistic Images from Limited Data

Authors: Yinqiu Feng, Bo Zhang, Lingxi Xiao, Yutian Yang, Tana Gegen, Zexi Chen

Abstract: In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator networ… ▽ More In this research, we introduce an innovative method for synthesizing medical images using generative adversarial networks (GANs). Our proposed GANs method demonstrates the capability to produce realistic synthetic images even when trained on a limited quantity of real medical image data, showcasing commendable generalization prowess. To achieve this, we devised a generator and discriminator network architecture founded on deep convolutional neural networks (CNNs), leveraging the adversarial training paradigm for model optimization. Through extensive experimentation across diverse medical image datasets, our method exhibits robust performance, consistently generating synthetic images that closely emulate the structural and textural attributes of authentic medical images. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.18546 [pdf]

Application of Multimodal Fusion Deep Learning Model in Disease Recognition

Authors: Xiaoyi Liu, Hongjie Qiu, Muqing Li, Zhou Yu, Yutian Yang, Yafeng Yan

Abstract: This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform… ▽ More This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transformers are applied to distill advanced features from image-based, temporal, and structured data sources. The fusion strategy component seeks to determine the optimal fusion mode tailored to the specific disease recognition task. In the experimental section, a comparison is made between the performance of the proposed multi-mode fusion model and existing single-mode recognition methods. The findings demonstrate significant advantages of the multimodal fusion model across multiple evaluation metrics. △ Less

Submitted 22 May, 2024; originally announced June 2024.

arXiv:2406.16981 [pdf]

Research on Feature Extraction Data Processing System For MRI of Brain Diseases Based on Computer Deep Learning

Authors: Lingxi Xiao, **xin Hu, Yutian Yang, Yinqiu Feng, Zichao Li, Zexi Chen

Abstract: Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional itera… ▽ More Most of the existing wavelet image processing techniques are carried out in the form of single-scale reconstruction and multiple iterations. However, processing high-quality fMRI data presents problems such as mixed noise and excessive computation time. This project proposes the use of matrix operations by combining mixed noise elimination methods with wavelet analysis to replace traditional iterative algorithms. Functional magnetic resonance imaging (fMRI) of the auditory cortex of a single subject is analyzed and compared to the wavelet domain signal processing technology based on repeated times and the world's most influential SPM8. Experiments show that this algorithm is the fastest in computing time, and its detection effect is comparable to the traditional iterative algorithm. However, this has a higher practical value for the processing of FMRI data. In addition, the wavelet analysis method proposed signal processing to speed up the calculation rate. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.13205 [pdf]

Application of Computer Deep Learning Model in Diagnosis of Pulmonary Nodules

Authors: Yutian Yang, Hongjie Qiu, Yulu Gong, Xiaoyi Liu, Yang Lin, Muqing Li

Abstract: The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition… ▽ More The 3D simulation model of the lung was established by using the reconstruction method. A computer aided pulmonary nodule detection model was constructed. The process iterates over the images to refine the lung nodule recognition model based on neural networks. It is integrated with 3D virtual modeling technology to improve the interactivity of the system, so as to achieve intelligent recognition of lung nodules. A 3D RCNN (Region-based Convolutional Neural Network) was utilized for feature extraction and nodule identification. The LUNA16 large sample database was used as the research dataset. FROC (Free-response Receiver Operating Characteristic) analysis was applied to evaluate the model, calculating sensitivity at various false positive rates to derive the average FROC. Compared with conventional diagnostic methods, the recognition rate was significantly improved. This technique facilitates the detection of pulmonary abnormalities at an initial phase, which holds immense value for the prompt diagnosis of lung malignancies. △ Less

Submitted 19 June, 2024; originally announced June 2024.

MSC Class: 68T10; 92C50

arXiv:2406.12757 [pdf, other]

MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning

Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu

Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' n… ▽ More Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 13pages,5figures

arXiv:2406.10054 [pdf, other]

SmartOracle: Generating Smart Contract Oracle via Fine-Grained Invariant Detection

Authors: Jianzhong Su, Jiachi Chen, Zhiyuan Fang, Xingwei Lin, Yutian Tang, Zibin Zheng

Abstract: As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose or… ▽ More As decentralized applications (DApps) proliferate, the increased complexity and usage of smart contracts have heightened their susceptibility to security incidents and financial losses. Although various vulnerability detection tools have been developed to mitigate these issues, they often suffer poor performance in detecting vulnerabilities, as they either rely on simplistic and general-purpose oracles that may be inadequate for vulnerability detection, or require user-specified oracles, which are labor-intensive to create. In this paper, we introduce SmartOracle, a dynamic invariant detector that automatically generates fine-grained invariants as application-specific oracles for vulnerability detection. From historical transactions, SmartOracle uses pattern-based detection and advanced inference to construct comprehensive properties, and mines multi-layer likely invariants to accommodate the complicated contract functionalities. After that, SmartOracle identifies smart contract vulnerabilities by hunting the violated invariants in new transactions. In the field of invariant detection, SmartOracle detects 50% more ERC20 invariants than existing dynamic invariant detection and achieves 96% precision rate. Furthermore, we build a dataset that contains vulnerable contracts from real-world security incidents. SmartOracle successfully detects 466 abnormal transactions with an acceptable precision rate 96%, involving 31 vulnerable contracts. The experimental results demonstrate its effectiveness in detecting smart contract vulnerabilities, especially those related to complicated contract functionalities. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.05962 [pdf, other]

Data Caching for Enterprise-Grade Petabyte-Scale OLAP

Authors: Chunxu Tang, Bin Fan, **g Zhao, Chen Liang, Yi Wang, Beinan Wang, Ziyue Qiu, Lu Qiu, Bowen Ding, Shouzhuo Sun, Saiguang Che, Jiaming Mai, Shouwei Chen, Yu Zhu, Jianjian Xie, Yutian, Sun, Yao Li, Yangjun Zhang, Ke Wang, Mingmin Chen

Abstract: With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these ch… ▽ More With the exponential growth of data and evolving use cases, petabyte-scale OLAP data platforms are increasingly adopting a model that decouples compute from storage. This shift, evident in organizations like Uber and Meta, introduces operational challenges including massive, read-heavy I/O traffic with potential throttling, as well as skewed and fragmented data access patterns. Addressing these challenges, this paper introduces the Alluxio local (edge) cache, a highly effective architectural optimization tailored for such environments. This embeddable cache, optimized for petabyte-scale data analytics, leverages local SSD resources to alleviate network I/O and API call pressures, significantly improving data transfer efficiency. Integrated with OLAP systems like Presto and storage services like HDFS, the Alluxio local cache has demonstrated its effectiveness in handling large-scale, enterprise-grade workloads over three years of deployment at Uber and Meta. We share insights and operational experiences in implementing these optimizations, providing valuable perspectives on managing modern, massive-scale OLAP workloads. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Accepted to the USENIX Annual Technical Conference (USENIX ATC) 2024

arXiv:2405.16121 [pdf]

Design and Implementation of an Emotion Analysis System Based on EEG Signals

Authors: Zhang Yutian, Huang Shan, Zhang Jianing, Fan Ci'en

Abstract: Traditional brain-computer systems are complex and expensive, and emotion classification algorithms lack repre-sentations of the intrinsic relationships between different channels of electroencephalogram (EEG) signals. There is still room for improvement in accuracy. To lower the research barrier for EEG and harness the rich information embedded in multi-channel EEG, we propose and implement a sim… ▽ More Traditional brain-computer systems are complex and expensive, and emotion classification algorithms lack repre-sentations of the intrinsic relationships between different channels of electroencephalogram (EEG) signals. There is still room for improvement in accuracy. To lower the research barrier for EEG and harness the rich information embedded in multi-channel EEG, we propose and implement a simple and user-friendly brain-computer system for classifying four emotions: happiness, sorrow, sadness, and tranquility. This system utilizes the fusion of convolutional attention mechanisms and fully pre-activated residual blocks, termed Attention-Convolution-based Pre-Activated Residual Network (ACPA-ResNet).In the hardware acquisition and preprocessing phase, we employ the ADS1299 integrated chip as the analog front-end and utilize the ESP32 microcontroller for initial EEG signal processing. Data is wirelessly transmitted to a PC through UDP protocol for further preprocessing. In the emotion analysis phase, ACPA-ResNet is designed to automatically extract and learn features from EEG signals, thereby enabling accurate classification of emotional states by learning time-frequency domain characteristics. ACPA-ResNet introduces an attention mechanism on the foundation of residual networks, adaptively assigning different weights to each channel. This allows it to focus on more meaningful EEG signals in both spatial and channel dimensions while avoiding the problems of gradient dispersion and explosion associated with deep network architectures.Through testing on 16 subjects, our system demonstrates stable EEG signal acquisition and transmission. The novel network significantly enhances emotion recognition accuracy, achieving an average emotion classification accuracy of 95.1%. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.10017 [pdf, other]

Mechanism for the Broadened Linewidth in Antiferromagnetic Resonance

Authors: Yutian Wang, Jiang Xiao

Abstract: The linewidth of antiferromagnetic resonance (AFMR) is found to be significantly broader than that of ferromagnetic resonance (FMR), even when the intrinsic Gilbert dam** parameter is the same for both systems. We investigate the origin of this enhanced dam** rate in AFMR by studying a bipartite magnet model. Through analytical calculations and numerical simulations, we present three perspecti… ▽ More The linewidth of antiferromagnetic resonance (AFMR) is found to be significantly broader than that of ferromagnetic resonance (FMR), even when the intrinsic Gilbert dam** parameter is the same for both systems. We investigate the origin of this enhanced dam** rate in AFMR by studying a bipartite magnet model. Through analytical calculations and numerical simulations, we present three perspectives on understanding this linewidth broadening in AFMR: i) The non-dissipative Heisenberg exchange interaction develops a dam**-like component in the presence of Gilbert dam**, ii) The transverse component of the exchange coupling reduces the AFMR frequency, thereby increasing the dam** rate, and iii) The antiferromagnetic eigenmode exhibits characteristics of a two-mode squeezed state, which is inherently linked to an enhanced dam** rate. Our findings provide a comprehensive understanding of the complex dynamics governing magnetic dissipation in antiferromagnet and offer insights into the experimentally observed broadened linewidths in AFMR spectra. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 6 pages, 2 figures

arXiv:2405.03547 [pdf, other]

Position: Leverage Foundational Models for Black-Box Optimization

Authors: Xingyou Song, Yingtao Tian, Robert Tjarko Lange, Chansoo Lee, Yu** Tang, Yutian Chen

Abstract: Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research. However, the fiel… ▽ More Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer vision. Their incorporation has been rapid and transformative, marking a significant paradigm shift in the field of machine learning research. However, the field of experimental design, grounded on black-box optimization, has been much less affected by such a paradigm shift, even though integrating LLMs with optimization presents a unique landscape ripe for exploration. In this position paper, we frame the field of black-box optimization around sequence-based foundation models and organize their relationship with previous literature. We discuss the most promising ways foundational language models can revolutionize optimization, which include harnessing the vast wealth of information encapsulated in free-form text to enrich task comprehension, utilizing highly flexible sequence models such as Transformers to engineer superior optimization strategies, and enhancing performance prediction over previously unseen search spaces. △ Less

Submitted 9 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: International Conference on Machine Learning (ICML) 2024

arXiv:2404.18087 [pdf, other]

Anomalous Quantum Propagation of Microcavity Exciton Polaritons

Authors: Tian Lingyu, Yutian Peng, Qihua Xiong, Sanjib Ghosh

Abstract: Here, we explore the quantum propagation of exciton polaritons in semiconductor microcavities, exhibiting intriguing effects such as interactions, decay, and disorder scatterings. Our investigation uncovers anomalies in their quantum propagation, deviating from predictions based on existing theories. By applying scaling theory, we elucidate the true nature of exciton polariton propagation, unveili… ▽ More Here, we explore the quantum propagation of exciton polaritons in semiconductor microcavities, exhibiting intriguing effects such as interactions, decay, and disorder scatterings. Our investigation uncovers anomalies in their quantum propagation, deviating from predictions based on existing theories. By applying scaling theory, we elucidate the true nature of exciton polariton propagation, unveiling a localization phase that characteristically differs from Anderson localization. Our numerical results agree with the self-consistent theory developed for exciton polariton condensates, incorporating non-linearity and finite lifetime. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 14 pages, 4 figures

arXiv:2404.13291 [pdf, other]

Liquidity Pool Design on Automated Market Makers

Authors: Xue Dong He, Chen Yang, Yutian Zhou

Abstract: Automated market makers are a popular type of decentralized exchange in which users trade assets with each other directly and automatically through a liquidity pool and a fixed pricing function. The liquidity provider contributes to the liquidity pool by supplying assets to the pool and in return they earn transaction fees from traders who trade through the pool. We propose a model of optimal liqu… ▽ More Automated market makers are a popular type of decentralized exchange in which users trade assets with each other directly and automatically through a liquidity pool and a fixed pricing function. The liquidity provider contributes to the liquidity pool by supplying assets to the pool and in return they earn transaction fees from traders who trade through the pool. We propose a model of optimal liquidity provision in which the risk-averse liquidity provider decides the investment proportion of wealth she would like to supply to the pool, trade in a centralized market, and consume in multiple periods. We derive the liquidity provider's optimal strategy by dynamic programming and numerically find the optimal liquidity pool that maximizes the liquidity provider's utility. Our findings indicate that the exchange rate volatility on the centralized market exerts a positive effect on the optimal transaction fee. Moreover, the optimal constant mean pricing formula is found to be related to the relative performance of the underlying assets on the centralized market. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12331 [pdf, other]

On the roles of stellar rotation and binarity in NGC 2423's main-sequence turnoff region

Authors: Yutian Bu, Chenyu He, Li Wang, Jiamao Lin, Chengyuan Li

Abstract: Research has shown that many young and intermediate-age clusters (younger than $\sim$2 Gyr) have extended main sequences and main-sequence turnoffs (eMSTOs), which cannot be adequately described by a single isochrone. The reason for the extended main sequences is now known, with the most probable cause being the fast rotation of stars. However, a significant fraction of slowly rotating stars form… ▽ More Research has shown that many young and intermediate-age clusters (younger than $\sim$2 Gyr) have extended main sequences and main-sequence turnoffs (eMSTOs), which cannot be adequately described by a single isochrone. The reason for the extended main sequences is now known, with the most probable cause being the fast rotation of stars. However, a significant fraction of slowly rotating stars form a younger stellar population than their fast-rotating counterparts, leading to speculation that they have undergone thorough rotational mixing processes internally. One speculation is that a considerable number of slowly rotating stars reside in close binary systems, where tidal forces from companion stars are the cause of their rotational deceleration. In this work, we report a relatively old open star cluster in the Milky Way, NGC 2423 ($\sim$1 Gyrs old), which exhibits an apparent eMSTO. As anticipated, many characteristics of NGC 2423 indicate that its eMSTO is driven by stellar rotations. Our calculations indicate that if slowly rotating stars commonly have a close companion star, they should exhibit significant differences in radial velocities observationally, and binary systems that can be tidally locked within the age of NGC 2423 should have a mass ratio close to 1. However, none of these predictions align with our observations. Interestingly, among the only two equal-mass binary systems in the observed region for which spectroscopic data could be obtained, we discovered that one of them is a tidally locked binary system. This further suggests the validity of our numerical simulation results. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 11 figures, 2 tables. Accepted for publication in ApJ

arXiv:2404.07839 [pdf, other]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned variant. Both models achieve comparable performance to Gemma-2B despite being trained on fewer tokens. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.06695 [pdf, other]

Spiral Scanning and Self-Supervised Image Reconstruction Enable Ultra-Sparse Sampling Multispectral Photoacoustic Tomography

Authors: Yutian Zhong, Xiaoming Zhang, Zongxin Mo, Shuangyang Zhang, Wufan Chen, Li Qi

Abstract: Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for mult… ▽ More Multispectral photoacoustic tomography (PAT) is an imaging modality that utilizes the photoacoustic effect to achieve non-invasive and high-contrast imaging of internal tissues. However, the hardware cost and computational demand of a multispectral PAT system consisting of up to thousands of detectors are huge. To address this challenge, we propose an ultra-sparse spiral sampling strategy for multispectral PAT, which we named U3S-PAT. Our strategy employs a sparse ring-shaped transducer that, when switching excitation wavelengths, simultaneously rotates and translates. This creates a spiral scanning pattern with multispectral angle-interlaced sampling. To solve the highly ill-conditioned image reconstruction problem, we propose a self-supervised learning method that is able to introduce structural information shared during spiral scanning. We simulate the proposed U3S-PAT method on a commercial PAT system and conduct in vivo animal experiments to verify its performance. The results show that even with a sparse sampling rate as low as 1/30, our U3S-PAT strategy achieves similar reconstruction and spectral unmixing accuracy as non-spiral dense sampling. Given its ability to dramatically reduce the time required for three-dimensional multispectral scanning, our U3S-PAT strategy has the potential to perform volumetric molecular imaging of dynamic biological activities. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05976 [pdf, other]

A Cyber Manufacturing IoT System for Adaptive Machine Learning Model Deployment by Interactive Causality Enabled Self-Labeling

Authors: Yutian Ren, Yuqi He, Xuyin Zhang, Aaron Yen, G. P. Li

Abstract: Machine Learning (ML) has been demonstrated to improve productivity in many manufacturing applications. To host these ML applications, several software and Industrial Internet of Things (IIoT) systems have been proposed for manufacturing applications to deploy ML applications and provide real-time intelligence. Recently, an interactive causality enabled self-labeling method has been proposed to ad… ▽ More Machine Learning (ML) has been demonstrated to improve productivity in many manufacturing applications. To host these ML applications, several software and Industrial Internet of Things (IIoT) systems have been proposed for manufacturing applications to deploy ML applications and provide real-time intelligence. Recently, an interactive causality enabled self-labeling method has been proposed to advance adaptive ML applications in cyber-physical systems, especially manufacturing, by automatically adapting and personalizing ML models after deployment to counter data distribution shifts. The unique features of the self-labeling method require a novel software system to support dynamism at various levels. This paper proposes the AdaptIoT system, comprised of an end-to-end data streaming pipeline, ML service integration, and an automated self-labeling service. The self-labeling service consists of causal knowledge bases and automated full-cycle self-labeling workflows to adapt multiple ML models simultaneously. AdaptIoT employs a containerized microservice architecture to deliver a scalable and portable solution for small and medium-sized manufacturers. A field demonstration of a self-labeling adaptive ML application is conducted with a makerspace and shows reliable performance. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.05809 [pdf, other]

Self-Labeling in Multivariate Causality and Quantification for Adaptive Machine Learning

Authors: Yutian Ren, Aaron Haohua Yen, G. P. Li

Abstract: Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data… ▽ More Adaptive machine learning (ML) aims to allow ML models to adapt to ever-changing environments with potential concept drift after model deployment. Traditionally, adaptive ML requires a new dataset to be manually labeled to tailor deployed models to altered data distributions. Recently, an interactive causality based self-labeling method was proposed to autonomously associate causally related data streams for domain adaptation, showing promising results compared to traditional feature similarity-based semi-supervised learning. Several unanswered research questions remain, including self-labeling's compatibility with multivariate causality and the quantitative analysis of the auxiliary models used in the self-labeling. The auxiliary models, the interaction time model (ITM) and the effect state detector (ESD), are vital to the success of self-labeling. This paper further develops the self-labeling framework and its theoretical foundations to address these research questions. A framework for the application of self-labeling to multivariate causal graphs is proposed using four basic causal relationships, and the impact of non-ideal ITM and ESD performance is analyzed. A simulated experiment is conducted based on a multivariate causal graph, validating the proposed theory. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04997 [pdf, other]

Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd

Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo… ▽ More The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context window sizes and the computational burdens entailed by their operations. This investigation presents an innovative framework that strategically tailors LLMs for streamlined context processing by harnessing the synergies among natural language summarization, soft prompt compression, and augmented utility preservation mechanisms. Our methodology, dubbed SoftPromptComp, amalgamates natural language prompts extracted from summarization methodologies with dynamically generated soft prompts to forge a concise yet semantically robust depiction of protracted contexts. This depiction undergoes further refinement via a weighting mechanism optimizing information retention and utility for subsequent tasks. We substantiate that our framework markedly diminishes computational overhead and enhances LLMs' efficacy across various benchmarks, while upholding or even augmenting the caliber of the produced content. By amalgamating soft prompt compression with sophisticated summarization, SoftPromptComp confronts the dual challenges of managing lengthy contexts and ensuring model scalability. Our findings point towards a propitious trajectory for augmenting LLMs' applicability and efficiency, rendering them more versatile and pragmatic for real-world applications. This research enriches the ongoing discourse on optimizing language models, providing insights into the potency of soft prompts and summarization techniques as pivotal instruments for the forthcoming generation of NLP solutions. △ Less

Submitted 18 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by the 2024 International Conference on Image Processing and Computer Applications (IPCA 2024)

arXiv:2404.01925 [pdf, other]

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

Authors: Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

Abstract: Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompo… ▽ More Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompose the original BEV segmentation task into two stages, namely BEV map reconstruction and RGB-BEV feature alignment. In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns. The second stage involves map** RGB input images into the BEV latent space of the first stage, directly optimizing the correlations between the two views at the feature level. Our approach simplifies the complexity of combining perception and generation into distinct steps, equip** the model to handle intricate and challenging scenes effectively. Besides, we propose to transform the BEV segmentation map from the Cartesian to the polar coordinate system to establish the column-wise correspondence between RGB images and BEV maps. Moreover, our method requires neither multi-scale features nor camera intrinsic parameters for depth estimation and saves computational overhead. Extensive experiments on nuScenes and Argoverse show the effectiveness and efficiency of our method. Code is available at https://github.com/happytianhao/TaDe. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.11461 [pdf, other]

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

Authors: Weiyao Wang, Yutian Lei, Shiyu **, Gregory D. Hager, Liangjun Zhang

Abstract: In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recogniz… ▽ More In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io. △ Less

Submitted 18 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.08348 [pdf, other]

A programmable topological photonic chip

Authors: Tianxiang Dai, Anqi Ma, Jun Mao, Yutian Ao, Xinyu Jia, Yun Zheng, Chonghao Zhai, Yan Yang, Zhihua Li, Bo Tang, Jun Luo, Baile Zhang, Xiaoyong Hu, Qihuang Gong, Jianwei Wang

Abstract: Controlling topological phases of light has allowed experimental observations of abundant topological phenomena and development of robust photonic devices. The prospect of more sophisticated controls with topological photonic devices for practical implementations requires high-level programmability. Here, we demonstrate a fully programmable topological photonic chip with large-scale integration of… ▽ More Controlling topological phases of light has allowed experimental observations of abundant topological phenomena and development of robust photonic devices. The prospect of more sophisticated controls with topological photonic devices for practical implementations requires high-level programmability. Here, we demonstrate a fully programmable topological photonic chip with large-scale integration of silicon photonic nanocircuits and microresonators. Photonic artificial atoms and their interactions in our compound system can be individually addressed and controlled, therefore allowing arbitrary altering of structural parameters and geometrical configurations for the observations of dynamic topological phase transitions and diverse photonic topological insulators. By individually programming artificial atoms on the generic chip, it has allowed comprehensive statistic characterisations of topological robustness against relatively weak disorders, as well as counterintuitive topological Anderson phase transitions induced by strong disorders. Our generic topological photonic chip that can be rapidly reprogrammed to implement multifunctionalities, prototypes a flexible and versatile platform for possible applications across fundamental science and topological technologies. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06420 [pdf, other]

RLingua: Improving Reinforcement Learning Sample Efficiency in Robotic Manipulations With Large Language Models

Authors: Liangliang Chen, Yutian Lei, Shiyu **, Ying Zhang, Liangjun Zhang

Abstract: Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs b… ▽ More Reinforcement learning (RL) has demonstrated its capability in solving various tasks but is notorious for its low sample efficiency. In this paper, we propose RLingua, a framework that can leverage the internal knowledge of large language models (LLMs) to reduce the sample complexity of RL in robotic manipulations. To this end, we first present a method for extracting the prior knowledge of LLMs by prompt engineering so that a preliminary rule-based robot controller for a specific task can be generated in a user-friendly manner. Despite being imperfect, the LLM-generated robot controller is utilized to produce action samples during rollouts with a decaying probability, thereby improving RL's sample efficiency. We employ TD3, the widely-used RL baseline method, and modify the actor loss to regularize the policy learning towards the LLM-generated controller. RLingua also provides a novel method of improving the imperfect LLM-generated robot controllers by RL. We demonstrate that RLingua can significantly reduce the sample complexity of TD3 in four robot tasks of panda_gym and achieve high success rates in 12 sampled sparsely rewarded robot tasks in RLBench, where the standard TD3 fails. Additionally, We validated RLingua's effectiveness in real-world robot experiments through Sim2Real, demonstrating that the learned policies are effectively transferable to real robot tasks. Further details about our work are available at our project website https://rlingua.github.io. △ Less

Submitted 19 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.01756 [pdf, other]

Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

Authors: Yutian Liu, Wenjun Ke, Jianguo Wei

Abstract: Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing… ▽ More Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing since it cannot correct the erroneous attention on image areas that should be parsed at subsequent decoding steps. This faulty attention causes the attention module to incorporate future context into the current decoding step, thereby confusing the alignment process. To address this issue, we propose an attention guidance mechanism to explicitly suppress attention weights in irrelevant areas and enhance the appropriate ones, thereby inhibiting access to information outside the intended context. Depending on the type of attention guidance, we devise two complementary approaches to refine attention weights: self-guidance that coordinates attention of multiple heads and neighbor-guidance that integrates attention from adjacent time steps. Experiments show that our method outperforms existing state-of-the-art methods, achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the CROHME 2014/ 2016/ 2019 datasets. △ Less

Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19427 [pdf, other]

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama-2 despite being trained on over 6 times fewer tokens. We also show that Griffin can extrapolate on sequences significantly longer than those seen during training. Our models match the hardware efficiency of Transformers during training, and during inference they have lower latency and significantly higher throughput. We scale Griffin up to 14B parameters, and explain how to shard our models for efficient distributed training. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 25 pages, 11 figures

arXiv:2402.15078 [pdf, other]

LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

Authors: Zhijie Liu, Yutian Tang, Meiyun Li, Xin **, Yunfei Long, Liang Feng Zhang, Xiapu Luo

Abstract: XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration comp… ▽ More XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration compatibility bugs. Our findings highlight certain limitations of LLMs in effectively identifying and resolving these bugs, while also revealing their potential in addressing complex, hard-to-repair issues that traditional tools struggle with. Leveraging these insights, we introduce the LLM-CompDroid framework, which combines the strengths of LLMs and traditional tools for bug resolution. Our experimental results demonstrate a significant enhancement in bug resolution performance by LLM-CompDroid, with LLM-CompDroid-GPT-3.5 and LLM-CompDroid-GPT-4 surpassing the state-of-the-art tool, ConfFix, by at least 9.8% and 10.4% in both Correct and Correct@k metrics, respectively. This innovative approach holds promise for advancing the reliability and robustness of Android applications, making a valuable contribution to the field of software development. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.14547 [pdf, other]

OmniPred: Language Models as Universal Regressors

Authors: Xingyou Song, Oscar Li, Chansoo Lee, Bangding Yang, Daiyi Peng, Sagi Perel, Yutian Chen

Abstract: Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evalu… ▽ More Over the broad landscape of experimental design, regression has been a powerful tool to accurately predict the outcome metrics of a system or model given a set of parameters, but has been traditionally restricted to methods which are only applicable to a specific task. In this paper, we propose OmniPred, a framework for training language models as universal end-to-end regressors over $(x,y)$ evaluation data from diverse real world experiments. Using data sourced from Google Vizier, one of the largest blackbox optimization databases in the world, our extensive experiments demonstrate that through only textual representations of mathematical parameters and values, language models are capable of very precise numerical regression, and if given the opportunity to train over multiple tasks, can significantly outperform traditional regression models. △ Less

Submitted 4 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 24 pages, 10 figures. Code can be found in https://github.com/google-research/optformer/tree/main/optformer/omnipred

arXiv:2402.11957 [pdf, other]

Event-Based Motion Magnification

Authors: Yutian Chen, Shi Guo, Fangzheng Yu, Feng Zhang, **wei Gu, Tianfan Xue

Abstract: Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Traditional motion magnification methods rely on costly high-speed cameras or active light sources, which limit the scope of their applications. In this work, we propose… ▽ More Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Traditional motion magnification methods rely on costly high-speed cameras or active light sources, which limit the scope of their applications. In this work, we propose a dual-camera system consisting of an event camera and a conventional RGB camera for video motion magnification, containing temporally-dense information from the event stream and spatially-dense data from the RGB images. This innovative combination enables a broad and cost-effective amplification of high-frequency motions. By revisiting the physical camera model, we observe that estimating motion direction and magnitude necessitates the integration of event streams with additional image features. On this basis, we propose a novel deep network for event-based video motion magnification that addresses two primary challenges: firstly, the high frequency of motion induces a large number of interpolated frames (up to 80), which our network mitigates with a Second-order Recurrent Propagation module for better handling of long-term frame interpolations; and secondly, magnifying subtle motions is sensitive to noise, which we address by utilizing a temporal filter to amplify motion at specific frequencies and reduce noise impact. We demonstrate the effectiveness and accuracy of our dual-camera system and network through extensive experiments in magnifying small-amplitude, high-frequency motions, offering a cost-effective and flexible solution for motion detection and magnification. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Project Page: https://openimaginglab.github.io/emm/

arXiv:2402.06798 [pdf, other]

Reasoning Gras** via Multimodal Large Language Model

Authors: Shiyu **, **xuan Xu, Yutian Lei, Liangjun Zhang

Abstract: Despite significant progress in robotic systems for operation within human-centric environments, existing models still heavily rely on explicit human commands to identify and manipulate specific objects. This limits their effectiveness in environments where understanding and acting on implicit human intentions are crucial. In this study, we introduce a novel task: reasoning gras**, where robots… ▽ More Despite significant progress in robotic systems for operation within human-centric environments, existing models still heavily rely on explicit human commands to identify and manipulate specific objects. This limits their effectiveness in environments where understanding and acting on implicit human intentions are crucial. In this study, we introduce a novel task: reasoning gras**, where robots need to generate grasp poses based on indirect verbal instructions or intentions. To accomplish this, we propose an end-to-end reasoning gras** model that integrates a multi-modal Large Language Model (LLM) with a vision-based robotic gras** framework. In addition, we present the first reasoning gras** benchmark dataset generated from the GraspNet-1 billion, incorporating implicit instructions for object-level and part-level gras**, and this dataset will soon be available for public access. Our results show that directly integrating CLIP or LLaVA with the grasp detection model performs poorly on the challenging reasoning gras** tasks, while our proposed model demonstrates significantly enhanced performance both in the reasoning gras** benchmark and real-world experiments. △ Less

Submitted 25 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.01156 [pdf, other]

An Empirical Study on Low Code Programming using Traditional vs Large Language Model Support

Authors: Yongkun Liu, Jiachi Chen, Tingting Bi, John Grundy, Yanlin Wang, Jianxing Yu, Ting Chen, Yutian Tang, Zibin Zheng

Abstract: Low-code programming (LCP) refers to programming using models at higher levels of abstraction, resulting in less manual and more efficient programming, and reduced learning effort for amateur developers. Many LCP tools have rapidly evolved and have benefited from the concepts of visual programming languages (VPLs) and programming by demonstration (PBD). With huge increase in interest in using larg… ▽ More Low-code programming (LCP) refers to programming using models at higher levels of abstraction, resulting in less manual and more efficient programming, and reduced learning effort for amateur developers. Many LCP tools have rapidly evolved and have benefited from the concepts of visual programming languages (VPLs) and programming by demonstration (PBD). With huge increase in interest in using large language models (LLMs) in software engineering, LLM-based LCP has began to become increasingly important. However, the technical principles and application scenarios of traditional approaches to LCP and LLM-based LCP are significantly different. Understanding these key differences and characteristics in the application of the two approaches to LCP by users is crucial for LCP providers in improving existing and develo** new LCP tools, and in better assisting users in choosing the appropriate LCP technology. We conducted an empirical study of both traditional LCP and LLM-based LCP. We analyzed developers' discussions on Stack Overflow (SO) over the past three years and then explored the similarities and differences between traditional LCP and LLM-based LCP features and developer feedback. Our findings reveal that while traditional LCP and LLM-based LCP share common primary usage scenarios, they significantly differ in scope, limitations and usage throughout the software development lifecycle, particularly during the implementation phase. We also examine how LLMs impact and integrate with LCP, discussing the latest technological developments in LLM-based LCP, such as its integration with VPLs and the application of LLM Agents in software engineering. △ Less

Submitted 6 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.17606 [pdf, other]

doi 10.1109/TDSC.2023.3253572

Ambush from All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines

Authors: Ziyue Pan, Wenbo Shen, Xingkai Wang, Yutian Yang, Rui Chang, Yao Liu, Chengwei Liu, Yang Liu, Kui Ren

Abstract: The continuous integration and continuous deployment (CI/CD) pipelines are widely adopted on Internet hosting platforms, such as GitHub. With the popularity, the CI/CD pipeline faces various security threats. However, current CI/CD pipelines suffer from malicious code and severe vulnerabilities. Even worse, people have not been fully aware of its attack surfaces and the corresponding impacts. Th… ▽ More The continuous integration and continuous deployment (CI/CD) pipelines are widely adopted on Internet hosting platforms, such as GitHub. With the popularity, the CI/CD pipeline faces various security threats. However, current CI/CD pipelines suffer from malicious code and severe vulnerabilities. Even worse, people have not been fully aware of its attack surfaces and the corresponding impacts. Therefore, in this paper, we conduct a large-scale measurement and a systematic analysis to reveal the attack surfaces of the CI/CD pipeline and quantify their security impacts. Specifically, for the measurement, we collect a data set of 320,000+ CI/CD pipeline-configured GitHub repositories and build an analysis tool to parse the CI/CD pipelines and extract security-critical usages. Besides, current CI/CD ecosystem heavily relies on several core scripts, which may lead to a single point of failure. While the CI/CD pipelines contain sensitive information/operations, making them the attacker's favorite targets. Inspired by the measurement findings, we abstract the threat model and the attack approach toward CI/CD pipelines, followed by a systematic analysis of attack surfaces, attack strategies, and the corresponding impacts. We further launch case studies on five attacks in real-world CI/CD environments to validate the revealed attack surfaces. Finally, we give suggestions on mitigating attacks on CI/CD scripts, including securing CI/CD configurations, securing CI/CD scripts, and improving CI/CD infrastructure. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Journal ref: IEEE Transactions on Dependable and Secure Computing (Volume: 21, Issue: 1, Jan.-Feb. 2024)

arXiv:2401.11396 [pdf, other]

Visual Imitation Learning with Calibrated Contrastive Representation

Authors: Yunke Wang, Linwei Tao, Bo Du, Yutian Lin, Chang Xu

Abstract: Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-dimensional states and actions. However, challenges arise in handling visual states due to their less distinguishable representation compared to low-dimensional proprioceptive features. While existing methods resort to adopt complex network architectures or separate the process of learning representation an… ▽ More Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-dimensional states and actions. However, challenges arise in handling visual states due to their less distinguishable representation compared to low-dimensional proprioceptive features. While existing methods resort to adopt complex network architectures or separate the process of learning representation and decision-making, they overlook valuable intra-agent information within demonstrations. To address this problem, this paper proposes a simple and effective solution by incorporating calibrated contrastive representative learning into visual AIL framework. Specifically, we present an image encoder in visual AIL, utilizing a combination of unsupervised and supervised contrastive learning to extract valuable features from visual states. Based on the fact that the improved agent often produces demonstrations of varying quality, we propose to calibrate the contrastive loss by treating each agent demonstrations as a mixed sample. The incorporation of contrastive learning can be jointly optimized with the AIL framework, without modifying the architecture or incurring significant computational costs. Experimental results on DMControl Suite demonstrate our proposed method is sample efficient and can outperform other compared methods from different aspects. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2401.08525 [pdf, other]

GATS: Gather-Attend-Scatter

Authors: Konrad Zolna, Serkan Cabi, Yutian Chen, Eric Lau, Claudio Fantacci, Jurgis Pasukonis, Jost Tobias Springenberg, Sergio Gomez Colmenarejo

Abstract: As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalit… ▽ More As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalities at different rates. In contrast to traditional fine-tuning, GATS allows for the original component models to remain frozen, avoiding the risk of them losing important knowledge acquired during the pretraining phase. We demonstrate the utility and versatility of GATS with a few experiments across games, robotics, and multimodal input-output systems. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.04717 [pdf, other]

Analytical solutions for optimal photon absorption into inhomogeneous spin memories

Authors: József Zsolt Bernád, Michael Schilling, Yutian Wen, Matthias M. Müller, Tommaso Calarco, Patrice Bertet, Felix Motzoi

Abstract: We investigate for optimal photon absorption a quantum electrodynamical model of an inhomogeneously-broadened spin ensemble coupled to a single-mode cavity. We consider a one-photon input pulse and obtain a simple one-parameter form for its optimal shape for absorption in the spin ensemble. Solutions to this problem are developed without using perturbation theory concerning the spin ensemble. Furt… ▽ More We investigate for optimal photon absorption a quantum electrodynamical model of an inhomogeneously-broadened spin ensemble coupled to a single-mode cavity. We consider a one-photon input pulse and obtain a simple one-parameter form for its optimal shape for absorption in the spin ensemble. Solutions to this problem are developed without using perturbation theory concerning the spin ensemble. Furthermore, we exploit the possibility of modulating the frequency and coupling rate of the resonator. We show some optimal scenarios and demonstrate the usefulness of our approach for the design of efficient quantum memories. In particular, we find the optimal cooperativity for different parameters and identify cases where absorption with a success probability larger than $99\%$ is achieved. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 18 pages, 20 figures

arXiv:2312.10615 [pdf, other]

A Symmetric Multigrid-Preconditioned Krylov Subspace Solver for Stokes Equations

Authors: Yutian Tao, Eftychios Sifakis

Abstract: Numerical solution of discrete PDEs corresponding to saddle point problems is highly relevant to physical systems such as Stokes flow. However, scaling up numerical solvers for such systems is often met with challenges in efficiency and convergence. Multigrid is an approach with excellent applicability to elliptic problems such as the Stokes equations, and can be a solution to such challenges of s… ▽ More Numerical solution of discrete PDEs corresponding to saddle point problems is highly relevant to physical systems such as Stokes flow. However, scaling up numerical solvers for such systems is often met with challenges in efficiency and convergence. Multigrid is an approach with excellent applicability to elliptic problems such as the Stokes equations, and can be a solution to such challenges of scalability and efficiency. The degree of success of such methods, however, is highly contingent on the design of key components of a multigrid scheme, including the hierarchy of discretizations, and the relaxation scheme used. Additionally, in many practical cases, it may be more effective to use a multigrid scheme as a preconditioner to an iterative Krylov subspace solver, as opposed to striving for maximum efficacy of the relaxation scheme in all foreseeable settings. In this paper, we propose an efficient symmetric multigrid preconditioner for the Stokes Equations on a staggered finite-difference discretization. Our contribution is focused on crafting a preconditioner that (a) is symmetric indefinite, matching the property of the Stokes system itself, (b) is appropriate for preconditioning the SQMR iterative scheme, and (c) has the requisite symmetry properties to be used in this context. In addition, our design is efficient in terms of computational cost and facilitates scaling to large domains. △ Less

Submitted 26 February, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

arXiv:2312.09439 [pdf, other]

Smart Roads: Roadside Perception, Vehicle-Road Cooperation and Business Model

Authors: Rui Chen, Lu Gao, Yutian Liu, Yong Liang Guan, Yan Zhang

Abstract: Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous veh… ▽ More Smart roads have become an essential component of intelligent transportation systems (ITS). The roadside perception technology, a critical aspect of smart roads, utilizes various sensors, roadside units (RSUs), and edge computing devices to gather real-time traffic data for vehicle-road cooperation. However, the full potential of smart roads in improving the safety and efficiency of autonomous vehicles only can be realized through the mass deployment of roadside perception and communication devices. On the one hand, roadside devices require significant investment but can only achieve monitoring function currently, resulting in no profitability for investors. On the other hand, drivers lack trust in the safety of autonomous driving technology, making it difficult to promote large-scale commercial applications. To deal with the dilemma of mass deployment, we propose a novel smart-road vehicle-guiding architecture for vehicle-road cooperative autonomous driving, based on which we then propose the corresponding business model and analyze its benefits from both operator and driver perspectives. The numerical simulations validate that our proposed smart road solution can enhance driving safety and traffic efficiency. Moreover, we utilize the cost-benefit analysis (CBA) model to assess the economic advantages of the proposed business model which indicates that the smart highway that can provide vehicle-guided-driving services for autonomous vehicles yields more profit than the regular highway. △ Less

Submitted 19 October, 2023; originally announced December 2023.

arXiv:2312.01727 [pdf]

Deep learning acceleration of iterative model-based light fluence correction for photoacoustic tomography

Authors: Zhaoyong Liang, Shuangyang Zhang, Zhichao Liang, Zhongxin Mo, Xiaoming Zhang, Yutian Zhong, Wufan Chen, Li Qi

Abstract: Photoacoustic tomography (PAT) is a promising imaging technique that can visualize the distribution of chromophores within biological tissue. However, the accuracy of PAT imaging is compromised by light fluence (LF), which hinders the quantification of light absorbers. Currently, model-based iterative methods are used for LF correction, but they require significant computational resources due to r… ▽ More Photoacoustic tomography (PAT) is a promising imaging technique that can visualize the distribution of chromophores within biological tissue. However, the accuracy of PAT imaging is compromised by light fluence (LF), which hinders the quantification of light absorbers. Currently, model-based iterative methods are used for LF correction, but they require significant computational resources due to repeated LF estimation based on differential light transport models. To improve LF correction efficiency, we propose to use Fourier neural operator (FNO), a neural network specially designed for solving differential equations, to learn the forward projection of light transport in PAT. Trained using paired finite-element-based LF simulation data, our FNO model replaces the traditional computational heavy LF estimator during iterative correction, such that the correction procedure is significantly accelerated. Simulation and experimental results demonstrate that our method achieves comparable LF correction quality to traditional iterative methods while reducing the correction time by over 30 times. △ Less

Submitted 7 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.16030 [pdf, other]

Machine Learning-Enhanced Aircraft Landing Scheduling under Uncertainties

Authors: Yutian Pang, Peng Zhao, Jueming Hu, Yongming Liu

Abstract: This paper addresses aircraft delays, emphasizing their impact on safety and financial losses. To mitigate these issues, an innovative machine learning (ML)-enhanced landing scheduling methodology is proposed, aiming to improve automation and safety. Analyzing flight arrival delay scenarios reveals strong multimodal distributions and clusters in arrival flight time durations. A multi-stage conditi… ▽ More This paper addresses aircraft delays, emphasizing their impact on safety and financial losses. To mitigate these issues, an innovative machine learning (ML)-enhanced landing scheduling methodology is proposed, aiming to improve automation and safety. Analyzing flight arrival delay scenarios reveals strong multimodal distributions and clusters in arrival flight time durations. A multi-stage conditional ML predictor enhances separation time prediction based on flight events. ML predictions are then integrated as safety constraints in a time-constrained traveling salesman problem formulation, solved using mixed-integer linear programming (MILP). Historical flight recordings and model predictions address uncertainties between successive flights, ensuring reliability. The proposed method is validated using real-world data from the Atlanta Air Route Traffic Control Center (ARTCC ZTL). Case studies demonstrate an average 17.2% reduction in total landing time compared to the First-Come-First-Served (FCFS) rule. Unlike FCFS, the proposed methodology considers uncertainties, instilling confidence in scheduling. The study concludes with remarks and outlines future research directions. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.08723 [pdf, other]

doi 10.18653/v1/2023.emnlp-main.810

Token Prediction as Implicit Classification to Identify LLM-Generated Text

Authors: Yutian Chen, Hao Kang, Vivian Zhai, Liangze Li, Rita Singh, Bhiksha Raj

Abstract: This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experi… ▽ More This paper introduces a novel approach for identifying the possible large language models (LLMs) involved in text generation. Instead of adding an additional classification layer to a base LM, we reframe the classification task as a next-token prediction task and directly fine-tune the base LM to perform it. We utilize the Text-to-Text Transfer Transformer (T5) model as the backbone for our experiments. We compared our approach to the more direct approach of utilizing hidden states for classification. Evaluation shows the exceptional performance of our method in the text classification task, highlighting its simplicity and efficiency. Furthermore, interpretability studies on the features extracted by our model reveal its ability to differentiate distinctive writing styles among various LLMs even in the absence of an explicit classifier. We also collected a dataset named OpenLLMText, containing approximately 340k text samples from human and LLMs, including GPT3.5, PaLM, LLaMA, and GPT2. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: EMNLP 2023, Main Conference

arXiv:2311.07049 [pdf]

Clifford Algebra-Based Iterated Extended Kalman Filter with Application to Low-Cost INS/GNSS Navigation

Authors: Wei Ouyang, Yutian Wang, Yuanxin Wu

Abstract: The traditional GNSS-aided inertial navigation system (INS) usually exploits the extended Kalman filter (EKF) for state estimation, and the initial attitude accuracy is key to the filtering performance. To spare the reliance on the initial attitude, this work generalizes the previously proposed trident quaternion within the framework of Clifford algebra to represent the extended pose, IMU biases a… ▽ More The traditional GNSS-aided inertial navigation system (INS) usually exploits the extended Kalman filter (EKF) for state estimation, and the initial attitude accuracy is key to the filtering performance. To spare the reliance on the initial attitude, this work generalizes the previously proposed trident quaternion within the framework of Clifford algebra to represent the extended pose, IMU biases and lever arms on the Lie group. Consequently, a quasi-group-affine system is established for the low-cost INS/GNSS integrated navigation system, and the right-error Clifford algebra-based EKF (Clifford-RQEKF) is accordingly developed. The iterated filtering approach is further applied to significantly improve the performances of the Clifford-RQEKF and the previously proposed trident quaternion-based EKFs. Numerical simulations and experiments show that all iterated filtering approaches fulfill the fast and global convergence without the prior attitude information, whereas the iterated Clifford-RQEKF performs much better than the others under especially large IMU biases. △ Less

Submitted 14 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

arXiv:2310.13562 [pdf, other]

Solving Coupled Nonlinear Forward-backward Stochastic Differential Equations: An Optimization Perspective with Backward Measurability Loss

Authors: Yutian Wang, Yuan-Hua Ni, Xun Li

Abstract: This paper aims to extend the BML method proposed in Wang et al. [22] to make it applicable to more general coupled nonlinear FBSDEs. We interpret BML from the fixed-point iteration perspective and show that optimizing BML is equivalent to minimizing the distance between two consecutive trial solutions in a fixed-point iteration. Thus, this paper provides a theoretical foundation for an optimizati… ▽ More This paper aims to extend the BML method proposed in Wang et al. [22] to make it applicable to more general coupled nonlinear FBSDEs. We interpret BML from the fixed-point iteration perspective and show that optimizing BML is equivalent to minimizing the distance between two consecutive trial solutions in a fixed-point iteration. Thus, this paper provides a theoretical foundation for an optimization-based approach to solving FBSDEs. We also empirically evaluate the method through four numerical experiments. △ Less

Submitted 25 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.12168 [pdf, other]

RK-core: An Established Methodology for Exploring the Hierarchical Structure within Datasets

Authors: Yao Lu, Yutian Huang, Jiaqi Nie, Zuohui Chen, Qi Xuan

Abstract: Recently, the field of machine learning has undergone a transition from model-centric to data-centric. The advancements in diverse learning tasks have been propelled by the accumulation of more extensive datasets, subsequently facilitating the training of larger models on these datasets. However, these datasets remain relatively under-explored. To this end, we introduce a pioneering approach known… ▽ More Recently, the field of machine learning has undergone a transition from model-centric to data-centric. The advancements in diverse learning tasks have been propelled by the accumulation of more extensive datasets, subsequently facilitating the training of larger models on these datasets. However, these datasets remain relatively under-explored. To this end, we introduce a pioneering approach known as RK-core, to empower gaining a deeper understanding of the intricate hierarchical structure within datasets. Across several benchmark datasets, we find that samples with low coreness values appear less representative of their respective categories, and conversely, those with high coreness values exhibit greater representativeness. Correspondingly, samples with high coreness values make a more substantial contribution to the performance in comparison to those with low coreness values. Building upon this, we further employ RK-core to analyze the hierarchical structure of samples with different coreset selection methods. Remarkably, we find that a high-quality coreset should exhibit hierarchical diversity instead of solely opting for representative samples. The code is available at https://github.com/yaolu-zjut/Kcore. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.09461 [pdf, other]

MAC: ModAlity Calibration for Object Detection

Authors: Yutian Lei, Jun Liu, Dong Huang

Abstract: The flourishing success of Deep Neural Networks(DNNs) on RGB-input perception tasks has opened unbounded possibilities for non-RGB-input perception tasks, such as object detection from wireless signals, lidar scans, and infrared images. Compared to the matured development pipeline of RGB-input (source modality) models, develo** non-RGB-input (target-modality) models from scratch poses excessive… ▽ More The flourishing success of Deep Neural Networks(DNNs) on RGB-input perception tasks has opened unbounded possibilities for non-RGB-input perception tasks, such as object detection from wireless signals, lidar scans, and infrared images. Compared to the matured development pipeline of RGB-input (source modality) models, develo** non-RGB-input (target-modality) models from scratch poses excessive challenges in the modality-specific network design/training tricks and labor in the target-modality annotation. In this paper, we propose ModAlity Calibration (MAC), an efficient pipeline for calibrating target-modality inputs to the DNN object detection models developed on the RGB (source) modality. We compose a target-modality-input model by adding a small calibrator module ahead of a source-modality model and introduce MAC training techniques to impose dense supervision on the calibrator. By leveraging (1) prior knowledge synthesized from the source-modality model and (2) paired {target, source} data with zero manual annotations, our target-modality models reach comparable or better metrics than baseline models that require 100% manual annotations. We demonstrate the effectiveness of MAC by composing the WiFi-input, Lidar-input, and Thermal-Infrared-input models upon the pre-trained RGB-input models respectively. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.04874 [pdf, other]

AirIMU: Learning Uncertainty Propagation for Inertial Odometry

Authors: Yuheng Qiu, Chen Wang, Can Xu, Yutian Chen, Xunfei Zhou, Youjie Xia, Sebastian Scherer

Abstract: Inertial odometry (IO) using strap-down inertial measurement units (IMUs) is critical in many robotic applications where precise orientation and position tracking are essential. Prior kinematic motion model-based IO methods often use a simplified linearized IMU noise model and thus usually encounter difficulties in modeling non-deterministic errors arising from environmental disturbances and mecha… ▽ More Inertial odometry (IO) using strap-down inertial measurement units (IMUs) is critical in many robotic applications where precise orientation and position tracking are essential. Prior kinematic motion model-based IO methods often use a simplified linearized IMU noise model and thus usually encounter difficulties in modeling non-deterministic errors arising from environmental disturbances and mechanical defects. In contrast, data-driven IO methods struggle to accurately model the sensor motions, often leading to generalizability and interoperability issues. To address these challenges, we present AirIMU, a hybrid approach to estimate the uncertainty, especially the non-deterministic errors, by data-driven methods and increase the generalization abilities using model-based methods. We demonstrate the adaptability of AirIMU using a full spectrum of IMUs, from low-cost automotive grades to high-end navigation grades. We also validate its effectiveness on various platforms, including hand-held devices, vehicles, and a helicopter that covers a trajectory of 262 kilometers. In the ablation study, we validate the effectiveness of our learned uncertainty in an IMU-GPS pose graph optimization experiment, achieving a 31.6\% improvement in accuracy. Experiments demonstrate that jointly training the IMU noise correction and uncertainty estimation synergistically benefits both tasks. △ Less

Submitted 15 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.13035 [pdf, other]

PyPose v0.6: The Imperative Programming Interface for Robotics

Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, **gnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.07518 [pdf, other]

Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation

Authors: Zhichao Zhou, Yuming Zhou, Chunrong Fang, Zhenyu Chen, Xiapu Luo, **gzhu He, Yutian Tang

Abstract: Unit testing is critical to the software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properti… ▽ More Unit testing is critical to the software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. Since combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the subsumption relationships among coverage goals, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms under the $2$-minute budget. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches. Secondly, we conduct experiments to verify our assumptions about coverage criteria relationships. Furthermore, we assess the coverage performance of smart selection under varying budgets of $5$, $8$, and $10$ minutes and explore its effect on bug detection, confirming the advantage of smart selection over combining all goals. △ Less

Submitted 4 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.04096

arXiv:2309.03036 [pdf, other]

An Efficient Temporary Deepfake Location Approach Based Embeddings for Partially Spoofed Audio Detection

Authors: Yuankun Xie, Haonan Cheng, Yutian Wang, Long Ye

Abstract: Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts:… ▽ More Partially spoofed audio detection is a challenging task, lying in the need to accurately locate the authenticity of audio at the frame level. To address this issue, we propose a fine-grained partially spoofed audio detection method, namely Temporal Deepfake Location (TDL), which can effectively capture information of both features and locations. Specifically, our approach involves two novel parts: embedding similarity module and temporal convolution operation. To enhance the identification between the real and fake features, the embedding similarity module is designed to generate an embedding space that can separate the real frames from fake frames. To effectively concentrate on the position information, temporal convolution operation is proposed to calculate the frame-specific similarities among neighboring frames, and dynamically select informative neighbors to convolution. Extensive experiments show that our method outperform baseline models in ASVspoof2019 Partial Spoof dataset and demonstrate superior performance even in the crossdataset scenario. △ Less

Submitted 21 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.13707 [pdf]

Human-in-the-loop online just-in-time software defect prediction

Authors: Xutong Liu, Yufei Zhou, Yutian Tang, Junyan Qian, Yuming Zhou

Abstract: Online Just-In-Time Software Defect Prediction (O-JIT-SDP) uses an online model to predict whether a new software change will introduce a bug or not. However, existing studies neglect the interaction of Software Quality Assurance (SQA) staff with the model, which may miss the opportunity to improve the prediction accuracy through the feedback from SQA staff. To tackle this problem, we propose Huma… ▽ More Online Just-In-Time Software Defect Prediction (O-JIT-SDP) uses an online model to predict whether a new software change will introduce a bug or not. However, existing studies neglect the interaction of Software Quality Assurance (SQA) staff with the model, which may miss the opportunity to improve the prediction accuracy through the feedback from SQA staff. To tackle this problem, we propose Human-In-The-Loop (HITL) O-JIT-SDP that integrates feedback from SQA staff to enhance the prediction process. Furthermore, we introduce a performance evaluation framework that utilizes a k-fold distributed bootstrap method along with the Wilcoxon signed-rank test. This framework facilitates thorough pairwise comparisons of alternative classification algorithms using a prequential evaluation approach. Our proposal enables continuous statistical testing throughout the prequential process, empowering developers to make real-time decisions based on robust statistical evidence. Through experimentation across 10 GitHub projects, we demonstrate that our evaluation framework enhances the credibility of model evaluation, and the incorporation of HITL feedback elevates the prediction performance of online JIT-SDP models. These advancements hold the potential to significantly enhance the value of O-JIT-SDP for industrial applications. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: 16 pages, 10 figures

arXiv:2308.11211 [pdf]

Simultaneous 3D Construction and Imaging of Plant Cells Using Plasmonic Nanoprobe Assisted Multimodal Nonlinear Optical Microscopy

Authors: Kun Liu, Yutian Lei, Dawei Li

Abstract: Nonlinear optical (NLO) imaging has emerged as a promising plant cell imaging technique due to its large optical penetration, inherent 3D spatial resolution, and reduced photodamage, meanwhile exogenous nanoprobes are usually needed for non-signal target cell analysis. Here, we report in-vivo, simultaneous 3D labeling and imaging of potato cell structures using plasmonic nanoprobe-assisted multimo… ▽ More Nonlinear optical (NLO) imaging has emerged as a promising plant cell imaging technique due to its large optical penetration, inherent 3D spatial resolution, and reduced photodamage, meanwhile exogenous nanoprobes are usually needed for non-signal target cell analysis. Here, we report in-vivo, simultaneous 3D labeling and imaging of potato cell structures using plasmonic nanoprobe-assisted multimodal NLO microscopy. Experimental results show that the complete cell structure could be imaged by the combination of second-harmonic generation (SHG) and two-photon luminescence (TPL) when noble metal silver or gold ions are added. In contrast, without noble metal ion solution, no NLO signals from the cell wall could be acquired. The mechanism can be attributed to noble metal nanoprobes with strong nonlinear optical responses formed along the cell walls via a femtosecond laser scan. During the SHG-TPL imaging process, noble metal ions that cross the cell wall could be rapidly reduced to plasmonic nanoparticles by fs laser and selectively anchored onto both sides of the cell wall, thereby leading to simultaneous 3D labeling and imaging of potato cells. Compared with traditional labeling technique that needs in-vitro nanoprobe fabrication and cell labeling, our approach allows for one-step, in-vivo labeling of plant cells, thus providing a rapid, cost-effective way for cellular structure construction and imaging. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 18 pages, 5 figures

arXiv:2308.05137 [pdf, other]

Discrepancy-based Active Learning for Weakly Supervised Bleeding Segmentation in Wireless Capsule Endoscopy Images

Authors: Fan Bai, Xiaohan Xing, Yutian Shen, Han Ma, Max Q. -H. Meng

Abstract: Weakly supervised methods, such as class activation maps (CAM) based, have been applied to achieve bleeding segmentation with low annotation efforts in Wireless Capsule Endoscopy (WCE) images. However, the CAM labels tend to be extremely noisy, and there is an irreparable gap between CAM labels and ground truths for medical images. This paper proposes a new Discrepancy-basEd Active Learning (DEAL)… ▽ More Weakly supervised methods, such as class activation maps (CAM) based, have been applied to achieve bleeding segmentation with low annotation efforts in Wireless Capsule Endoscopy (WCE) images. However, the CAM labels tend to be extremely noisy, and there is an irreparable gap between CAM labels and ground truths for medical images. This paper proposes a new Discrepancy-basEd Active Learning (DEAL) approach to bridge the gap between CAMs and ground truths with a few annotations. Specifically, to liberate labor, we design a novel discrepancy decoder model and a CAMPUS (CAM, Pseudo-label and groUnd-truth Selection) criterion to replace the noisy CAMs with accurate model predictions and a few human labels. The discrepancy decoder model is trained with a unique scheme to generate standard, coarse and fine predictions. And the CAMPUS criterion is proposed to predict the gaps between CAMs and ground truths based on model divergence and CAM divergence. We evaluate our method on the WCE dataset and results show that our method outperforms the state-of-the-art active learning methods and reaches comparable performance to those trained with full annotated datasets with only 10% of the training data labeled. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: accepted by MICCAI 2022

arXiv:2308.04838 [pdf, other]

No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT

Authors: Zhijie Liu, Yutian Tang, Xiapu Luo, Yuming Zhou, Liang Feng Zhang

Abstract: Large language models (LLMs) have demonstrated impressive capabilities across various NLP tasks. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer produc… ▽ More Large language models (LLMs) have demonstrated impressive capabilities across various NLP tasks. Additionally, LLMs are also highly valuable in supporting software engineering tasks, particularly in the field of code generation. Automatic code generation is a process of automatically generating source code or executable code based on given specifications or requirements, improving developer productivity. In this study, we perform a systematic empirical assessment to the quality of code generation using ChatGPT. We leverage 728 algorithm problems in five languages (i.e., C, C++, Java, Python, and JavaScript) and 18 CWEs with 54 code scenarios for the code generation task. Our evaluation encompasses a comprehensive analysis of code snippets generated by ChatGPT, focusing on three critical aspects: correctness, complexity, and security. We also specifically investigate ChatGPT's ability to engage in multi-round fixing process (i.e., ChatGPT's dialog ability) of facilitating code generation. By delving into the generated code and examining the experimental results, this work provides valuable insights into the performance of ChatGPT in tackling code generation tasks over the three critical aspects. Overall, our findings uncover potential issues and limitations that arise in the ChatGPT-based code generation and lay the groundwork for improving AI and LLM-based code generation techniques. △ Less

Submitted 13 April, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

Showing 1–50 of 181 results for author: Yutian