Search | arXiv e-print repository

doi 10.1038/s41467-023-43464-z

Thermal and electrostatic tuning of surface phonon-polaritons in LaAlO3/SrTiO3 heterostructures

Authors: Yixi Zhou, Adrien Waelchli, Margherita Boselli, Iris Crassee, Adrien Bercher, Weiwei Luo, Jiahua Duan, J. L. M. van Mechelen, Dirk van der Marel, Jérémie Teyssier, Carl Willem Rischau, Lukas Korosec, Stefano Gariglio, Jean-Marc Triscone, Alexey B. Kuzmenko

Abstract: Phonon polaritons are promising for infrared applications due to a strong light-matter coupling and subwavelength energy confinement they offer. Yet, the spectral narrowness of the phonon bands and difficulty to tune the phonon polariton properties hinder further progress in this field. SrTiO3 - a prototype perovskite oxide - has recently attracted attention due to two prominent far-infrared phono… ▽ More Phonon polaritons are promising for infrared applications due to a strong light-matter coupling and subwavelength energy confinement they offer. Yet, the spectral narrowness of the phonon bands and difficulty to tune the phonon polariton properties hinder further progress in this field. SrTiO3 - a prototype perovskite oxide - has recently attracted attention due to two prominent far-infrared phonon polaritons bands, albeit without any tuning reported so far. Here we show, using cryogenic infrared near-field microscopy, that long-propagating surface phonon polaritons are present both in bare SrTiO3 and in LaAlO3/SrTiO3 heterostructures hosting a two-dimensional electron gas. The presence of the two-dimensional electron gas increases dramatically the thermal variation of the upper limit of the surface phonon polariton band due to temperature dependent polaronic screening of the surface charge carriers. Furthermore, we demonstrate a tunability of the upper surface phonon polariton frequency in LaAlO3/SrTiO3 via electrostatic gating. Our results suggest that oxide interfaces are a new platform bridging unconventional electronics and long-wavelength nanophotonics. △ Less

Submitted 13 November, 2023; originally announced December 2023.

Comments: Nature Communications, in press

Journal ref: Nature Communications 14, 7686 (2023)

arXiv:2312.02003 [pdf, other]

doi 10.1016/j.hcc.2024.100211

A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly

Authors: Yifan Yao, **hao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, Yue Zhang

Abstract: Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained t… ▽ More Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized natural language understanding and generation. They possess deep language comprehension, human-like text generation capabilities, contextual awareness, and robust problem-solving skills, making them invaluable in various domains (e.g., search engines, customer support, translation). In the meantime, LLMs have also gained traction in the security community, revealing security vulnerabilities and showcasing their potential in security-related tasks. This paper explores the intersection of LLMs with security and privacy. Specifically, we investigate how LLMs positively impact security and privacy, potential risks and threats associated with their use, and inherent vulnerabilities within LLMs. Through a comprehensive literature review, the paper categorizes the papers into "The Good" (beneficial LLM applications), "The Bad" (offensive applications), and "The Ugly" (vulnerabilities of LLMs and their defenses). We have some interesting findings. For example, LLMs have proven to enhance code security (code vulnerability detection) and data privacy (data confidentiality protection), outperforming traditional methods. However, they can also be harnessed for various attacks (particularly user-level attacks) due to their human-like reasoning abilities. We have identified areas that require further research efforts. For example, Research on model and parameter extraction attacks is limited and often theoretical, hindered by LLM parameter scale and confidentiality. Safe instruction tuning, a recent development, requires more exploration. We hope that our work can shed light on the LLMs' potential to both bolster and jeopardize cybersecurity. △ Less

Submitted 20 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.01836 [pdf, other]

Integrated Drill Boom Hole-Seeking Control via Reinforcement Learning

Authors: Haoqi Yan, Haoyuan Xu, Hongbo Gao, Fei Ma, Shengbo Eben Li, **gliang Duan

Abstract: Intelligent drill boom hole-seeking is a promising technology for enhancing drilling efficiency, mitigating potential safety hazards, and relieving human operators. Most existing intelligent drill boom control methods rely on a hierarchical control framework based on inverse kinematics. However, these methods are generally time-consuming due to the computational complexity of inverse kinematics an… ▽ More Intelligent drill boom hole-seeking is a promising technology for enhancing drilling efficiency, mitigating potential safety hazards, and relieving human operators. Most existing intelligent drill boom control methods rely on a hierarchical control framework based on inverse kinematics. However, these methods are generally time-consuming due to the computational complexity of inverse kinematics and the inefficiency of the sequential execution of multiple joints. To tackle these challenges, this study proposes an integrated drill boom control method based on Reinforcement Learning (RL). We develop an integrated drill boom control framework that utilizes a parameterized policy to directly generate control inputs for all joints at each time step, taking advantage of joint posture and target hole information. By formulating the hole-seeking task as a Markov decision process, contemporary mainstream RL algorithms can be directly employed to learn a hole-seeking policy, thus eliminating the need for inverse kinematics solutions and promoting cooperative multi-joint control. To enhance the drilling accuracy throughout the entire drilling process, we devise a state representation that combines Denavit-Hartenberg joint information and preview hole-seeking discrepancy data. Simulation results show that the proposed method significantly outperforms traditional methods in terms of hole-seeking accuracy and time efficiency. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2312.00084 [pdf, other]

Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?

Authors: Zhengyue Zhao, **hao Duan, Kaidi Xu, Chenan Wang, Rui Zhang, Zidong Du, Qi Guo, Xing Hu

Abstract: Stable Diffusion has established itself as a foundation model in generative AI artistic applications, receiving widespread research and application. Some recent fine-tuning methods have made it feasible for individuals to implant personalized concepts onto the basic Stable Diffusion model with minimal computational costs on small datasets. However, these innovations have also given rise to issues… ▽ More Stable Diffusion has established itself as a foundation model in generative AI artistic applications, receiving widespread research and application. Some recent fine-tuning methods have made it feasible for individuals to implant personalized concepts onto the basic Stable Diffusion model with minimal computational costs on small datasets. However, these innovations have also given rise to issues like facial privacy forgery and artistic copyright infringement. In recent studies, researchers have explored the addition of imperceptible adversarial perturbations to images to prevent potential unauthorized exploitation and infringements when personal data is used for fine-tuning Stable Diffusion. Although these studies have demonstrated the ability to protect images, it is essential to consider that these methods may not be entirely applicable in real-world scenarios. In this paper, we systematically evaluate the use of perturbations to protect images within a practical threat model. The results suggest that these approaches may not be sufficient to safeguard image privacy and copyright effectively. Furthermore, we introduce a purification method capable of removing protected perturbations while preserving the original image structure to the greatest extent possible. Experiments reveal that Stable Diffusion can effectively learn from purified images over all protective methods. △ Less

Submitted 24 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.15795 [pdf]

Control water waves by metagratings

Authors: Linkang Han, Qilin Duan, Junliang Duan, Shan Zhu, Shiming Chen, Yuhang Yin, Huanyang Chen

Abstract: Metasurfaces and metagratings offers new platforms for electromagnetic wave control with significant responses. However, metasurfaces based on abrupt phase change and resonant structures suffer from the drawback of high loss and face challenges when applied in water waves. Therefore, the application of metasurfaces in water wave control is not ideal due to the limitations associated with high loss… ▽ More Metasurfaces and metagratings offers new platforms for electromagnetic wave control with significant responses. However, metasurfaces based on abrupt phase change and resonant structures suffer from the drawback of high loss and face challenges when applied in water waves. Therefore, the application of metasurfaces in water wave control is not ideal due to the limitations associated with high loss and other challenges. We have discovered that non-resonant metagratings exhibit promising effects in water wave control. Leveraging the similarity between bridges and metagratings, we have successfully developed a water wave metagrating model inspired by the Luoyang Bridge in ancient China. We conducted theoretical calculations and simulations on the metagrating and derived the equivalent anisotropic model of the metagrating. This model provides evidence that the metagrating has the capability to control water waves and achieve unidirectional surface water wave. The accuracy of our theory is strongly supported by the clear observation of the unidirectional propagation phenomenon during simulation and experiments conducted using a reduced version of the metagrating. It is the first time that the unidirectional propagation of water waves has been seen in water wave metagrating experiment. Above all, we realize the water wave metagrating experiment for the first time. By combining complex gratings with real bridges, we explore the physics embedded in the ancient building-Luoyang Bridge, which are of great significance for the water wave metagrating design, as well as the development and preservation of ancient bridges. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 25 pages, 13 figures

arXiv:2311.15566 [pdf, other]

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Authors: Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

Abstract: The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on pre… ▽ More The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on preemptible instances requires addressing challenges induced by frequent instance preemptions and the necessity of migrating instances to handle these preemptions. This paper presents SpotServe, the first distributed LLM serving system on preemptible instances. Several key techniques in SpotServe realize fast and reliable serving of generative LLMs on cheap preemptible instances. First, SpotServe dynamically adapts the LLM parallelization configuration for dynamic instance availability and fluctuating workload, while balancing the trade-off among the overall throughput, inference latency and monetary costs. Second, to minimize the cost of migrating instances for dynamic reparallelization, the task of migrating instances is formulated as a bipartite graph matching problem, which uses the Kuhn-Munkres algorithm to identify an optimal migration plan that minimizes communications. Finally, to take advantage of the grace period offered by modern clouds, we introduce stateful inference recovery, a new inference mechanism that commits inference progress at a much finer granularity and allows SpotServe to cheaply resume inference upon preemption. We evaluate on real spot instance preemption traces and various popular LLMs and show that SpotServe can reduce the P99 tail latency by 2.4 - 9.1x compared with the best existing LLM serving systems. We also show that SpotServe can leverage the price advantage of preemptive instances, saving 54% monetary cost compared with only using on-demand instances. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: ASPLOS 2024

arXiv:2311.14097 [pdf, other]

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

Authors: Fei Kong, **hao Duan, Lichao Sun, Hao Cheng, Ren**g Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

Abstract: Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributio… ▽ More Though diffusion models excel in image generation, their step-by-step denoising leads to slow generation speeds. Consistency training addresses this issue with single-step sampling but often produces lower-quality generations and requires high training costs. In this paper, we show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions. As timestep increases, the upper bound accumulates previous consistency training losses. Therefore, larger batch sizes are needed to reduce both current and accumulated losses. We propose Adversarial Consistency Training (ACT), which directly minimizes the Jensen-Shannon (JS) divergence between distributions at each timestep using a discriminator. Theoretically, ACT enhances generation quality, and convergence. By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$\times$64 and LSUN Cat 256$\times$256 datasets, retains zero-shot image inpainting capabilities, and uses less than $1/6$ of the original batch size and fewer than $1/2$ of the model parameters and training steps compared to the baseline method, this leads to a substantial reduction in resource consumption. Our code is available:https://github.com/kong13661/ACT △ Less

Submitted 28 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

Comments: To appear in CVPR 2024

arXiv:2311.12308 [pdf, other]

Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment

Authors: **li Duan, Shasha Dennis

Abstract: Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific simulations and data analysis. These workflows are often developed by domain scientists using Jupyter notebooks, which are convenient yet face limitations: the… ▽ More Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific simulations and data analysis. These workflows are often developed by domain scientists using Jupyter notebooks, which are convenient yet face limitations: they struggle to scale with larger data sets, lack failure tolerance, and depend heavily on the stability of underlying tools and packages. To address these issues, Jup2Kup has been developed. This software system translates workflows from Jupyter notebooks into a distributed, high-performance Kubernetes environment, enhancing fault tolerance. It also manages software dependencies to maintain operational stability amidst changes in tools and packages. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: for associated software, see https://github.com/shirou10086/Scientificworkflow

arXiv:2311.08124 [pdf, other]

High-order accurate well-balanced energy stable finite difference schemes for multi-layer shallow water equations on fixed and adaptive moving meshes

Authors: Zhihao Zhang, Huazhong Tang, Junming Duan

Abstract: This paper develops high-order well-balanced (WB) energy stable (ES) finite difference schemes for multi-layer (the number of layers $M\geqslant 2$) shallow water equations (SWEs) on both fixed and adaptive moving meshes, extending our previous works [20,51]. To obtain an energy inequality, the convexity of an energy function for an arbitrary $M$ is proved by finding recurrence relations of the le… ▽ More This paper develops high-order well-balanced (WB) energy stable (ES) finite difference schemes for multi-layer (the number of layers $M\geqslant 2$) shallow water equations (SWEs) on both fixed and adaptive moving meshes, extending our previous works [20,51]. To obtain an energy inequality, the convexity of an energy function for an arbitrary $M$ is proved by finding recurrence relations of the leading principal minors or the quadratic forms of the Hessian matrix of the energy function with respect to the conservative variables, which is more involved than the single-layer case due to the coupling between the layers in the energy function. An important ingredient in develo** high-order semi-discrete ES schemes is the construction of a two-point energy conservative (EC) numerical flux. In pursuit of the WB property, a sufficient condition for such EC fluxes is given with compatible discretizations of the source terms similar to the single-layer case. It can be decoupled into $M$ identities individually for each layer, making it convenient to construct a two-point EC flux for the multi-layer system. To suppress possible oscillations near discontinuities, WENO-based dissipation terms are added to the high-order WB EC fluxes, which gives semi-discrete high-order WB ES schemes. Fully-discrete schemes are obtained by employing high-order explicit SSP-RK methods and proved to preserve the lake at rest. The schemes are further extended to moving meshes based on a modified energy function for a reformulated system, relying on the techniques proposed in [51]. Numerical experiments are conducted for some two- and three-layer cases to validate the high-order accuracy, WB and ES properties, and high efficiency of the schemes, with a suitable amount of dissipation chosen by estimating the maximal wave speed due to the lack of an analytical expression for the eigenstructure of the multi-layer system. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 54 pages, 19 figures

arXiv:2311.04193 [pdf, other]

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

Authors: Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Abstract: Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visu… ▽ More Embodied AI models often employ off the shelf vision backbones like CLIP to encode their visual observations. Although such general purpose representations encode rich syntactic and semantic information about the scene, much of this information is often irrelevant to the specific task at hand. This introduces noise within the learning process and distracts the agent's focus from task-relevant visual cues. Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI. Our approach induces a task-conditioned bottleneck using a small learnable codebook module. This codebook is trained jointly to optimize task reward and acts as a task-conditioned selective filter over the visual observation. Our experiments showcase state-of-the-art performance for object goal navigation and object displacement across 5 benchmarks, ProcTHOR, ArchitecTHOR, RoboTHOR, AI2-iTHOR, and ManipulaTHOR. The filtered representations produced by the codebook are also able generalize better and converge faster when adapted to other simulation environments such as Habitat. Our qualitative analyses show that agents explore their environments more effectively and their representations retain task-relevant information like target object recognition while ignoring superfluous information about other objects. Code and pretrained models are available at our project website: https://embodied-codebook.github.io. △ Less

Submitted 9 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: See project website: https://embodied-codebook.github.io

arXiv:2311.04173 [pdf]

doi 10.1038/s41563-023-01582-5

Multiple and spectrally robust photonic magic angles in reconfigurable α-MoO3 trilayers

Authors: J. Duan, G. Álvarez-Pérez, C. Lanza, A. I. F. Tresguerres-Mata, K. Voronin, N. Capote-Robayna, A. Tarazaga Martín-Luengo, J. Martín-Sánchez, V. S. Volkov, A. Y. Nikitin, P. Alonso-González

Abstract: The assembling of twisted stacks of van der Waals (vdW) materials had led to the discovery of a profusion of remarkable physical phenomena in recent years, as it provides a means to accurately control and harness electronic band structures. This has given birth to the so-called field of twistronics. An analogous concept has been developed for highly confined polaritons, or nanolight, in twisted bi… ▽ More The assembling of twisted stacks of van der Waals (vdW) materials had led to the discovery of a profusion of remarkable physical phenomena in recent years, as it provides a means to accurately control and harness electronic band structures. This has given birth to the so-called field of twistronics. An analogous concept has been developed for highly confined polaritons, or nanolight, in twisted bilayers of strongly anisotropic vdW materials, extending the field to the twistoptics realm. In this case, the emergence of a topological transition of the polaritonic dispersion at a given twist angle (photonic magic angle) results in the propagation of nanolight along one specific direction (canalization regime), holding promises for unprecedented control of the flow of energy at the nanoscale. However, there is a fundamental limitation in twistoptics that critically impedes such control: there is only one photonic magic angle (and thus canalization direction) in a twisted bilayer and it is fixed for each incident frequency. Here, we overcome this limitation by demonstrating the existence of multiple spectrally robust photonic magic angles in reconfigurable twisted vdW trilayers. As a result, we show that canalization of nanolight can be programmed at will along any desired in-plane direction in a single device, and, importantly, within broad spectral ranges of up to 70 cm-1. Our findings lay the foundation for robust and widely tunable twistoptics, opening the door for applications in nanophotonics where on-demand control of energy at the nanoscale is crucial, such as thermal management, nanoimaging or entanglement of quantum emitters. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 16 pages, 3 figures

Journal ref: Nat. Mater. 22, 867-872 (2023)

arXiv:2311.04115 [pdf, other]

Guts determine the leading coefficients of $L^2$-Alexander torsions

Authors: Jianru Duan

Abstract: For 3-manifolds, the leading coefficient of the $L^2$-Alexander torsion is a numerical invariant of a real first cohomology class. We show that the leading coefficient equals the relative $L^2$-torsion of the manifold cut up along a norm-minimizing surface dual to the cohomology class. Furthermore, the leading coefficient equals the relative $L^2$-torsion of the guts associated to the cohomology c… ▽ More For 3-manifolds, the leading coefficient of the $L^2$-Alexander torsion is a numerical invariant of a real first cohomology class. We show that the leading coefficient equals the relative $L^2$-torsion of the manifold cut up along a norm-minimizing surface dual to the cohomology class. Furthermore, the leading coefficient equals the relative $L^2$-torsion of the guts associated to the cohomology class. Finally, we prove that the leading coefficient is constant on any open Thurston cone. The main ingredients are a new criterion for the convergence of Fuglede-Kadison determinants and the work of Agol and Zhang on guts of 3-manifolds. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 16 pages, 1 figure. Comments are welcome!

MSC Class: 57K31

arXiv:2311.03408 [pdf, other]

Training Multi-layer Neural Networks on Ising Machine

Authors: Xujie Song, Tong Liu, Shengbo Eben Li, **gliang Duan, Wenxuan Wang, Keqiang Li

Abstract: As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This… ▽ More As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function. As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines, providing an alternative to gradient-based backpropagation. Firstly, training QNN is formulated as a quadratic constrained binary optimization (QCBO) problem by representing neuron connection and activation function as equality constraints. All quantized variables are encoded by binary bits based on binary encoding protocol. Secondly, QCBO is converted to a quadratic unconstrained binary optimization (QUBO) problem, that can be efficiently solved on Ising machines. The conversion leverages both penalty function and Rosenberg order reduction, who together eliminate equality constraints and reduce high-order loss function into a quadratic one. With some assumptions, theoretical analysis shows the space complexity of our algorithm is $\mathcal{O}(H^2L + HLN\log H)$, quantifying the required number of Ising spins. Finally, the algorithm effectiveness is validated with a simulated Ising machine on MNIST dataset. After annealing 700 ms, the classification accuracy achieves 98.3%. Among 100 runs, the success probability of finding the optimal solution is 72%. Along with the increasing number of spins on Ising machine, our algorithm has the potential to train deeper neural networks. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2311.02523 [pdf, other]

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

Authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, **ming Duan

Abstract: Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial… ▽ More Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: Accepted by Neurips 2023

arXiv:2311.02344 [pdf, other]

You Only Forward Once: Prediction and Rationalization in A Single Forward Pass

Authors: Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang

Abstract: Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predi… ▽ More Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 20 pages, 5 figures, and 11 tables

arXiv:2310.19022 [pdf, other]

doi 10.1109/TCYB.2023.3323316

Optimization Landscape of Policy Gradient Methods for Discrete-time Static Output Feedback

Authors: **gliang Duan, Jie Li, Xuyang Chen, Kai Zhao, Shengbo Eben Li, Lin Zhao

Abstract: In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the opti… ▽ More In recent times, significant advancements have been made in delving into the optimization landscape of policy gradient methods for achieving optimal control in linear time-invariant (LTI) systems. Compared with state-feedback control, output-feedback control is more prevalent since the underlying state of the system may not be fully observed in many practical settings. This paper analyzes the optimization landscape inherent to policy gradient methods when applied to static output feedback (SOF) control in discrete-time LTI systems subject to quadratic cost. We begin by establishing crucial properties of the SOF cost, encompassing coercivity, L-smoothness, and M-Lipschitz continuous Hessian. Despite the absence of convexity, we leverage these properties to derive novel findings regarding convergence (and nearly dimension-free rate) to stationary points for three policy gradient methods, including the vanilla policy gradient method, the natural policy gradient method, and the Gauss-Newton method. Moreover, we provide proof that the vanilla policy gradient method exhibits linear convergence towards local minima when initialized near such minima. The paper concludes by presenting numerical examples that validate our theoretical findings. These results not only characterize the performance of gradient descent for optimizing the SOF problem but also provide insights into the effectiveness of general policy gradient methods within the realm of reinforcement learning. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Journal ref: IEEE Transactions on Cybernetics, 2023

arXiv:2310.14906 [pdf, other]

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

Authors: Weijie Liu, Xiaoxi Zhang, **gpu Duan, Carlee Joe-Wong, Zhi Zhou, Xu Chen

Abstract: Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consum… ▽ More Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to hyperparameters like batch size and aggregation frequency, the joint effects of adjusting these parameters on model performance, training time, and resource consumption have been overlooked, especially when facing dynamic data streams and network characteristics. This paper introduces novel analytical models and optimization algorithms that leverage the interplay between batch size and aggregation frequency to navigate the trade-offs among convergence, cost, and completion time for dynamic FL training. We establish a new convergence bound for training error considering heterogeneous datasets across devices and derive closed-form solutions for co-optimized batch size and aggregation frequency that are consistent across all devices. Additionally, we design an efficient algorithm for assigning different batch configurations across devices, improving model accuracy and addressing the heterogeneity of both data and system characteristics. Further, we propose an adaptive control algorithm that dynamically estimates network states, efficiently samples appropriate data batches, and effectively adjusts batch sizes and aggregation frequency on the fly. Extensive experiments demonstrate the superiority of our offline optimal solutions and online adaptive algorithm. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 20 pages, 12 figures

ACM Class: I.2.6

arXiv:2310.07018 [pdf, other]

NEWTON: Are Large Language Models Capable of Physical Reasoning?

Authors: Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

Abstract: Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor… ▽ More Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repository and benchmark for evaluating the physics reasoning skills of LLMs. Further, to enable domain-specific adaptation of this benchmark, we present a pipeline to enable researchers to generate a variant of this benchmark that has been customized to the objects and attributes relevant for their application. The NEWTON repository comprises a collection of 2800 object-attribute pairs, providing the foundation for generating infinite-scale assessment templates. The NEWTON benchmark consists of 160K QA questions, curated using the NEWTON repository to investigate the physical reasoning capabilities of several mainstream language models across foundational, explicit, and implicit reasoning tasks. Through extensive empirical analysis, our results highlight the capabilities of LLMs for physical reasoning. We find that LLMs like GPT-4 demonstrate strong reasoning capabilities in scenario-based tasks but exhibit less consistency in object-attribute reasoning compared to humans (50% vs. 84%). Furthermore, the NEWTON platform demonstrates its potential for evaluating and enhancing language models, paving the way for their integration into physically grounded settings, such as robotic manipulation. Project site: https://newtonreasoning.github.io △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings; 8 pages, 3 figures, 7 tables; Project page: https://newtonreasoning.github.io

arXiv:2310.06059 [pdf, other]

Early Warning Prediction with Automatic Labeling in Epilepsy Patients

Authors: Peng Zhang, Ting Gao, ** Guo, **qiao Duan, Sergey Nikolenko

Abstract: Early warning for epilepsy patients is crucial for their safety and well-being, in particular to prevent or minimize the severity of seizures. Through the patients' EEG data, we propose a meta learning framework to improve the prediction of early ictal signals. The proposed bi-level optimization framework can help automatically label noisy data at the early ictal stage, as well as optimize the tra… ▽ More Early warning for epilepsy patients is crucial for their safety and well-being, in particular to prevent or minimize the severity of seizures. Through the patients' EEG data, we propose a meta learning framework to improve the prediction of early ictal signals. The proposed bi-level optimization framework can help automatically label noisy data at the early ictal stage, as well as optimize the training accuracy of the backbone model. To validate our approach, we conduct a series of experiments to predict seizure onset in various long-term windows, with LSTM and ResNet implemented as the baseline models. Our study demonstrates that not only the ictal prediction accuracy obtained by meta learning is significantly improved, but also the resulting model captures some intrinsic patterns of the noisy data that a single backbone model could not learn. As a result, the predicted probability generated by the meta network serves as a highly effective early warning indicator. △ Less

Submitted 11 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: 13 pages,4 figures

arXiv:2310.05858 [pdf, other]

DSAC-T: Distributional Soft Actor-Critic with Three Refinements

Authors: **gliang Duan, Wenxuan Wang, Liming Xiao, Jiaxin Gao, Shengbo Eben Li

Abstract: Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effecti… ▽ More Reinforcement learning (RL) has proven to be highly effective in tackling complex decision-making and control tasks. However, prevalent model-free RL methods often face severe performance degradation due to the well-known overestimation issue. In response to this problem, we recently introduced an off-policy RL algorithm, called distributional soft actor-critic (DSAC or DSAC-v1), which can effectively improve the value estimation accuracy by learning a continuous Gaussian value distribution. Nonetheless, standard DSAC has its own shortcomings, including occasionally unstable learning processes and the necessity for task-specific reward scaling, which may hinder its overall performance and adaptability in some special tasks. This paper further introduces three important refinements to standard DSAC in order to address these shortcomings. These refinements consist of expected value substituting, twin value distribution learning, and variance-based critic gradient adjusting. The modified RL algorithm is named as DSAC with three refinements (DSAC-T or DSAC-v2), and its performances are systematically evaluated on a diverse set of benchmark tasks. Without any task-specific hyperparameter tuning, DSAC-T surpasses or matches a lot of mainstream model-free RL algorithms, including SAC, TD3, DDPG, TRPO, and PPO, in all tested environments. Additionally, DSAC-T, unlike its standard version, ensures a highly stable learning process and delivers similar performance across varying reward scales. △ Less

Submitted 28 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.04992 [pdf, other]

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.03377 [pdf, other]

ACT-Net: Anchor-context Action Detection in Surgery Videos

Authors: Luoying Hao, Yan Hu, Wenjun Lin, Qun Wang, Heng Li, Huazhu Fu, **ming Duan, Jiang Liu

Abstract: Recognition and localization of surgical detailed actions is an essential component of develo** a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementin… ▽ More Recognition and localization of surgical detailed actions is an essential component of develo** a context-aware decision support system. However, most existing detection algorithms fail to provide high-accuracy action classes even having their locations, as they do not consider the surgery procedure's regularity in the whole video. This limitation hinders their application. Moreover, implementing the predictions in clinical applications seriously needs to convey model confidence to earn entrustment, which is unexplored in surgical action prediction. In this paper, to accurately detect fine-grained actions that happen at every moment, we propose an anchor-context action detection network (ACTNet), including an anchor-context detection (ACD) module and a class conditional diffusion (CCD) module, to answer the following questions: 1) where the actions happen; 2) what actions are; 3) how confidence predictions are. Specifically, the proposed ACD module spatially and temporally highlights the regions interacting with the extracted anchor in surgery video, which outputs action location and its class distribution based on anchor-context interactions. Considering the full distribution of action classes in videos, the CCD module adopts a denoising diffusion-based generative model conditioned on our ACD estimator to further reconstruct accurately the action predictions. Moreover, we utilize the stochastic nature of the diffusion model outputs to access model confidence for each prediction. Our method reports the state-of-the-art performance, with improvements of 4.0% mAP against baseline on the surgical video dataset. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Accepted early by MICCAI2023 (Oral)

arXiv:2310.02493 [pdf, other]

Concurrent spin squeezing and light squeezing in an atomic ensemble

Authors: Shenchao **, Junlei Duan, Youwei Zhang, Xichang Zhang, Han Bao, Heng Shen, Liantuan Xiao, Suotang Jia, Mingfeng Wang, Yanhong Xiao

Abstract: Squeezed spin states and squeezed light are both key resources for quantum metrology and quantum information science, but have largely been separately investigated experimentally so far. Simultaneous generation of these two types of quantum states in one experiment setup is an intriguing goal, and could also enable the study of the analogies and distinctions between atoms and light from a new pers… ▽ More Squeezed spin states and squeezed light are both key resources for quantum metrology and quantum information science, but have largely been separately investigated experimentally so far. Simultaneous generation of these two types of quantum states in one experiment setup is an intriguing goal, and could also enable the study of the analogies and distinctions between atoms and light from a new perspective. Here we report an experimental demonstration of concurrent spin squeezing and light squeezing in a hot atomic ensemble, by judiciously engineering a symmetric atom-light interaction Hamiltonian. The squeezing process is deterministic, yielding fixed squeezing directions for the light field and the collective atomic spin. Furthermore, the squeezed light modes lie in the multiple frequency sidebands of a single spatial mode. This novel type of dual squeezed state may be a promising resource for quantum information science and technologies. Our method can be extended to other quantum platforms such as optomechanical and cold atom systems. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: manuscript with 7 pages, 4 figures, supplemental material with 8 pages, 4 figures

arXiv:2309.15526 [pdf, other]

doi 10.1145/3581783.3612356

P2I-NET: Map** Camera Pose to Image via Adversarial Learning for New View Synthesis in Real Indoor Environments

Authors: Xujie Kang, Kanglin Liu, Jiang Duan, Yuanhao Gong, Guo** Qiu

Abstract: Given a new $6DoF$ camera pose in an indoor environment, we study the challenging problem of predicting the view from that pose based on a set of reference RGBD views. Existing explicit or implicit 3D geometry construction methods are computationally expensive while those based on learning have predominantly focused on isolated views of object categories with regular geometric structure. Differing… ▽ More Given a new $6DoF$ camera pose in an indoor environment, we study the challenging problem of predicting the view from that pose based on a set of reference RGBD views. Existing explicit or implicit 3D geometry construction methods are computationally expensive while those based on learning have predominantly focused on isolated views of object categories with regular geometric structure. Differing from the traditional \textit{render-inpaint} approach to new view synthesis in the real indoor environment, we propose a conditional generative adversarial neural network (P2I-NET) to directly predict the new view from the given pose. P2I-NET learns the conditional distribution of the images of the environment for establishing the correspondence between the camera pose and its view of the environment, and achieves this through a number of innovative designs in its architecture and training lost function. Two auxiliary discriminator constraints are introduced for enforcing the consistency between the pose of the generated image and that of the corresponding real world image in both the latent feature space and the real world pose space. Additionally a deep convolutional neural network (CNN) is introduced to further reinforce this consistency in the pixel space. We have performed extensive new view synthesis experiments on real indoor datasets. Results show that P2I-NET has superior performance against a number of NeRF based strong baseline models. In particular, we show that P2I-NET is 40 to 100 times faster than these competitor techniques while synthesising similar quality images. Furthermore, we contribute a new publicly available indoor environment dataset containing 22 high resolution RGBD videos where each frame also has accurate camera pose parameters. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13245 [pdf, other]

RBFormer: Improve Adversarial Robustness of Transformer by Robust Bias

Authors: Hao Cheng, **hao Duan, Hui Li, Lyutianyang Zhang, Jiahang Cao, ** Wang, Jize Zhang, Kaidi Xu, Ren**g Xu

Abstract: Recently, there has been a surge of interest and attention in Transformer-based structures, such as Vision Transformer (ViT) and Vision Multilayer Perceptron (VMLP). Compared with the previous convolution-based structures, the Transformer-based structure under investigation showcases a comparable or superior performance under its distinctive attention-based input token mixer strategy. Introducing… ▽ More Recently, there has been a surge of interest and attention in Transformer-based structures, such as Vision Transformer (ViT) and Vision Multilayer Perceptron (VMLP). Compared with the previous convolution-based structures, the Transformer-based structure under investigation showcases a comparable or superior performance under its distinctive attention-based input token mixer strategy. Introducing adversarial examples as a robustness consideration has had a profound and detrimental impact on the performance of well-established convolution-based structures. This inherent vulnerability to adversarial attacks has also been demonstrated in Transformer-based structures. In this paper, our emphasis lies on investigating the intrinsic robustness of the structure rather than introducing novel defense measures against adversarial attacks. To address the susceptibility to robustness issues, we employ a rational structure design approach to mitigate such vulnerabilities. Specifically, we enhance the adversarial robustness of the structure by increasing the proportion of high-frequency structural robust biases. As a result, we introduce a novel structure called Robust Bias Transformer-based Structure (RBFormer) that shows robust superiority compared to several existing baseline structures. Through a series of extensive experiments, RBFormer outperforms the original structures by a significant margin, achieving an impressive improvement of +16.12% and +5.04% across different evaluation criteria on CIFAR-10 and ImageNet-1k, respectively. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: BMVC 2023

arXiv:2309.10474 [pdf, ps, other]

On quadratic conjecture

Authors: **g**g Duan, Lijian An

Abstract: Quadratic conjecture is a strengthening of oliver's $p$-group conjecture. Let $G$ be a $p$-group of maximal class of order $p^n$. We prove that if $n\le 8$ or $n\ge \max\{2p-6,p+2\}$ then $G$ satisfies Quadratic Conjecture. Hence quadratic conjecture holds if $G$ is a $p$-group of maximal class where $p\le 7$. Quadratic conjecture is a strengthening of oliver's $p$-group conjecture. Let $G$ be a $p$-group of maximal class of order $p^n$. We prove that if $n\le 8$ or $n\ge \max\{2p-6,p+2\}$ then $G$ satisfies Quadratic Conjecture. Hence quadratic conjecture holds if $G$ is a $p$-group of maximal class where $p\le 7$. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.09025 [pdf, other]

Efficient Privacy-Preserving Convolutional Spiking Neural Networks with FHE

Authors: Pengbo Li, Huifang Huang, Ting Gao, ** Guo, **qiao Duan

Abstract: With the rapid development of AI technology, we have witnessed numerous innovations and conveniences. However, along with these advancements come privacy threats and risks. Fully Homomorphic Encryption (FHE) emerges as a key technology for privacy-preserving computation, enabling computations while maintaining data privacy. Nevertheless, FHE has limitations in processing continuous non-polynomial… ▽ More With the rapid development of AI technology, we have witnessed numerous innovations and conveniences. However, along with these advancements come privacy threats and risks. Fully Homomorphic Encryption (FHE) emerges as a key technology for privacy-preserving computation, enabling computations while maintaining data privacy. Nevertheless, FHE has limitations in processing continuous non-polynomial functions as it is restricted to discrete integers and supports only addition and multiplication. Spiking Neural Networks (SNNs) operate on discrete spike signals, naturally aligning with the properties of FHE. In this paper, we present a framework called FHE-DiCSNN. This framework is based on the efficient TFHE scheme and leverages the discrete properties of SNNs to achieve high prediction performance on ciphertexts. Firstly, by employing bootstrap** techniques, we successfully implement computations of the Leaky Integrate-and-Fire neuron model on ciphertexts. Through bootstrap**, we can facilitate computations for SNNs of arbitrary depth. This framework can be extended to other spiking neuron models, providing a novel framework for the homomorphic evaluation of SNNs. Secondly, inspired by CNNs, we adopt convolutional methods to replace Poisson encoding. This not only enhances accuracy but also mitigates the issue of prolonged simulation time caused by random encoding. Furthermore, we employ engineering techniques to parallelize the computation of bootstrap**, resulting in a significant improvement in computational efficiency. Finally, we evaluate our model on the MNIST dataset. Experimental results demonstrate that, with the optimal parameter configuration, FHE-DiCSNN achieves an accuracy of 97.94% on ciphertexts, with a loss of only 0.53% compared to the original network's accuracy of 98.47%. Moreover, each prediction requires only 0.75 seconds of computation time △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.08868 [pdf, other]

MHLAT: Multi-hop Label-wise Attention Model for Automatic ICD Coding

Authors: Junwen Duan, Han Jiang, Ying Yu

Abstract: International Classification of Diseases (ICD) coding is the task of assigning ICD diagnosis codes to clinical notes. This can be challenging given the large quantity of labels (nearly 9,000) and lengthy texts (up to 8,000 tokens). However, unlike the single-pass reading process in previous works, humans tend to read the text and label definitions again to get more confident answers. Moreover, alt… ▽ More International Classification of Diseases (ICD) coding is the task of assigning ICD diagnosis codes to clinical notes. This can be challenging given the large quantity of labels (nearly 9,000) and lengthy texts (up to 8,000 tokens). However, unlike the single-pass reading process in previous works, humans tend to read the text and label definitions again to get more confident answers. Moreover, although pretrained language models have been used to address these problems, they suffer from huge memory usage. To address the above problems, we propose a simple but effective model called the Multi-Hop Label-wise ATtention (MHLAT), in which multi-hop label-wise attention is deployed to get more precise and informative representations. Extensive experiments on three benchmark MIMIC datasets indicate that our method achieves significantly better or competitive performance on all seven metrics, with much fewer parameters to optimize. △ Less

Submitted 16 September, 2023; originally announced September 2023.

Comments: 5 pages, 1 figure, accepted in ICASSP 2023

arXiv:2309.07398 [pdf, other]

Semantic Adversarial Attacks via Diffusion Models

Authors: Chenan Wang, **hao Duan, Chaowei Xiao, Edward Kim, Matthew Stamm, Kaidi Xu

Abstract: Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack b… ▽ More Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models since semantic information is included in the latent space of well-trained diffusion models. Then there are two variants of this framework: 1) the Semantic Transformation (ST) approach fine-tunes the latent space of the generated image and/or the diffusion model itself; 2) the Latent Masking (LM) approach masks the latent space with another target image and local backpropagation-based interpretation methods. Additionally, the ST approach can be applied in either white-box or black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ datasets, and our framework demonstrates great fidelity, generalizability, and transferability compared to other baselines. Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61. Code is available at https://github.com/steven202/semantic_adv_via_dm. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: To appear in BMVC 2023

arXiv:2309.06034 [pdf, other]

Normality Learning-based Graph Anomaly Detection via Multi-Scale Contrastive Learning

Authors: **gcan Duan, Pei Zhang, Siwei Wang, **gtao Hu, Hu **, Jiaxin Zhang, Haifang Zhou, Xinwang Liu

Abstract: Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining. Recent works have mainly focused on how to capture richer information to improve the quality of node embeddings for GAD. Despite their significant advances in detection performance, there is still a relative dearth of research on the properties of the task. GAD aims to discern the anomalies that d… ▽ More Graph anomaly detection (GAD) has attracted increasing attention in machine learning and data mining. Recent works have mainly focused on how to capture richer information to improve the quality of node embeddings for GAD. Despite their significant advances in detection performance, there is still a relative dearth of research on the properties of the task. GAD aims to discern the anomalies that deviate from most nodes. However, the model is prone to learn the pattern of normal samples which make up the majority of samples. Meanwhile, anomalies can be easily detected when their behaviors differ from normality. Therefore, the performance can be further improved by enhancing the ability to learn the normal pattern. To this end, we propose a normality learning-based GAD framework via multi-scale contrastive learning networks (NLGAD for abbreviation). Specifically, we first initialize the model with the contrastive networks on different scales. To provide sufficient and reliable normal nodes for normality learning, we design an effective hybrid strategy for normality selection. Finally, the model is refined with the only input of reliable normal nodes and learns a more accurate estimate of normality so that anomalous nodes can be more easily distinguished. Eventually, extensive experiments on six benchmark graph datasets demonstrate the effectiveness of our normality learning-based scheme on GAD. Notably, the proposed algorithm improves the detection performance (up to 5.89% AUC gain) compared with the state-of-the-art methods. The source code is released at https://github.com/FelixDJC/NLGAD. △ Less

Submitted 30 September, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 10 pages, 7 figures, accepted by ACM MM 2023

arXiv:2309.03842 [pdf, other]

doi 10.1063/5.0195042

Early warning indicators via latent stochastic dynamical systems

Authors: Lingyu Feng, Ting Gao, Wang Xiao, **qiao Duan

Abstract: Detecting early warning indicators for abrupt dynamical transitions in complex systems or high-dimensional observation data is essential in many real-world applications, such as brain diseases, natural disasters, and engineering reliability. To this end, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in the low-dimensional manifol… ▽ More Detecting early warning indicators for abrupt dynamical transitions in complex systems or high-dimensional observation data is essential in many real-world applications, such as brain diseases, natural disasters, and engineering reliability. To this end, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in the low-dimensional manifold. Then three effective warning signals (Onsager-Machlup Indicator, Sample Entropy Indicator, and Transition Probability Indicator) are derived through the latent coordinates and the latent stochastic dynamical systems. To validate our framework, we apply this methodology to authentic electroencephalogram (EEG) data. We find that our early warning indicators are capable of detecting the tip** point during state transition. This framework not only bridges the latent dynamics with real-world data but also shows the potential ability for automatic labeling on complex high-dimensional time series. △ Less

Submitted 5 April, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.02906 [pdf, other]

Well-posedness and averaging principle for Lévy-type McKean-Vlasov stochastic differential equations under local Lipschitz conditions

Authors: Ying Chao, **qiao Duan, Ting Gao, **yuan Wei

Abstract: In this paper, we investigate a class of McKean-Vlasov stochastic differential equations under Lévy-type perturbations. We first establish the existence and uniqueness theorem for solutions of the McKean-Vlasov stochastic differential equations by utilizing the Euler-like approximation. Then under some suitable conditions, we show that the solutions of McKean-Vlasov stochastic differential equatio… ▽ More In this paper, we investigate a class of McKean-Vlasov stochastic differential equations under Lévy-type perturbations. We first establish the existence and uniqueness theorem for solutions of the McKean-Vlasov stochastic differential equations by utilizing the Euler-like approximation. Then under some suitable conditions, we show that the solutions of McKean-Vlasov stochastic differential equations can be approximated by the solutions of the associated averaged McKean-Vlasov stochastic differential equations in the sense of mean square convergence. In contrast to the existing work, a novel feature is the use of a much weaker condition -- local Lipschitzian in the state variables, allowing for possibly super-linearly growing drift, but linearly growing diffusion and jump coefficients. Therefore, our results are suitable for a wider class of McKean-Vlasov stochastic differential equations. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 29 pages, 7 figures

MSC Class: 60H10; 60G51; 34C29; 35Q83

arXiv:2308.15627 [pdf, other]

Target PCA: Transfer Learning Large Dimensional Panel Data

Authors: Junting Duan, Markus Pelger, Ruoxuan Xiong

Abstract: This paper develops a novel method to estimate a latent factor model for a large target panel with missing observations by optimally using the information from auxiliary panel data sets. We refer to our estimator as target-PCA. Transfer learning from auxiliary panel data allows us to deal with a large fraction of missing observations and weak signals in the target panel. We show that our estimator… ▽ More This paper develops a novel method to estimate a latent factor model for a large target panel with missing observations by optimally using the information from auxiliary panel data sets. We refer to our estimator as target-PCA. Transfer learning from auxiliary panel data allows us to deal with a large fraction of missing observations and weak signals in the target panel. We show that our estimator is more efficient and can consistently estimate weak factors, which are not identifiable with conventional methods. We provide the asymptotic inferential theory for target-PCA under very general assumptions on the approximate factor model and missing patterns. In an empirical study of imputing data in a mixed-frequency macroeconomic panel, we demonstrate that target-PCA significantly outperforms all benchmark methods. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: Journal of Econometrics, accepted. The Internet Appendix (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4556029) collects the detailed proofs for all the theoretical statements in the main text, the data description, and additional simulation results

arXiv:2308.12529 [pdf, other]

Privacy-Preserving Discretized Spiking Neural Networks

Authors: Pengbo Li, Ting Gao, Huifang Huang, Jiani Cheng, Shuhong Gao, Zhigang Zeng, **qiao Duan

Abstract: The rapid development of artificial intelligence has brought considerable convenience, yet also introduces significant security risks. One of the research hotspots is to balance data privacy and utility in the real world of artificial intelligence. The present second-generation artificial neural networks have made tremendous advances, but some big models could have really high computational costs.… ▽ More The rapid development of artificial intelligence has brought considerable convenience, yet also introduces significant security risks. One of the research hotspots is to balance data privacy and utility in the real world of artificial intelligence. The present second-generation artificial neural networks have made tremendous advances, but some big models could have really high computational costs. The third-generation neural network, SNN (Spiking Neural Network), mimics real neurons by using discrete spike signals, whose sequences exhibit strong sparsity, providing advantages such as low energy consumption and high efficiency. In this paper, we construct a framework to evaluate the homomorphic computation of SNN named FHE-DiSNN that enables SNN to achieve good prediction performance on encrypted data. First, benefitting from the discrete nature of spike signals, our proposed model avoids the errors introduced by discretizing activation functions. Second, by applying bootstrap**, we design new private preserving functions FHE-Fire and FHE-Reset, through which noise can be refreshed, allowing us to evaluate SNN for an arbitrary number of operations. Furthermore, We improve the computational efficiency of FHE-DiSNN while maintaining a high level of accuracy. Finally, we evaluate our model on the MNIST dataset. The experiments show that FHE-DiSNN with 30 neurons in the hidden layer achieves a minimum prediction accuracy of 94.4%. Under optimal parameters, it achieves a 95.1% accuracy, with only a 0.6% decrease compared to the original SNN (95.7%). These results demonstrate the superiority of SNN over second-generation neural networks for homomorphic evaluation. △ Less

Submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.10270 [pdf, ps, other]

The stochastic fractional Strichartz estimate and blow-up for Schrödinger equation

Authors: Ao Zhang, Yanjie Zhang, Xiao Wang, Zibo Wang, **qiao Duan

Abstract: We establish the stochastic Strichartz estimate for the fractional Schrödinger equation with multiplicative noise. With the help of the deterministic Strichartz estimates, we prove the existence and uniqueness of a global solution to the stochastic fractional nonlinear Schrödinger equation in $L^2(\mathbb{R}^n)$. In addition, we also prove a general blow up result by deriving a localized virial es… ▽ More We establish the stochastic Strichartz estimate for the fractional Schrödinger equation with multiplicative noise. With the help of the deterministic Strichartz estimates, we prove the existence and uniqueness of a global solution to the stochastic fractional nonlinear Schrödinger equation in $L^2(\mathbb{R}^n)$. In addition, we also prove a general blow up result by deriving a localized virial estimate and the generalized Strauss inequality with a restricted class of initial data. △ Less

Submitted 4 January, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

arXiv:2308.09604 [pdf, other]

Faster Stochastic Variance Reduction Methods for Compositional MiniMax Optimization

Authors: ** Liu, Xiaokang Pan, Junwen Duan, Hongdong Li, Youqi Li, Zhe Qu

Abstract: This paper delves into the realm of stochastic optimization for compositional minimax optimization - a pivotal challenge across various machine learning domains, including deep AUC and reinforcement learning policy evaluation. Despite its significance, the problem of compositional minimax optimization is still under-explored. Adding to the complexity, current methods of compositional minimax optim… ▽ More This paper delves into the realm of stochastic optimization for compositional minimax optimization - a pivotal challenge across various machine learning domains, including deep AUC and reinforcement learning policy evaluation. Despite its significance, the problem of compositional minimax optimization is still under-explored. Adding to the complexity, current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes. To respond to these constraints, this paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(κ^3 /ε^3 )$ to obtain the $ε$-accuracy solution. We also demonstrate that NSTORM can achieve the same sample complexity under the Polyak-Łojasiewicz (PL)-condition - an insightful extension of its capabilities. Yet, NSTORM encounters an issue with its requirement for low learning rates, potentially constraining its real-world applicability in machine learning. To overcome this hurdle, we present ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates. We demonstrate that ADA-NSTORM can achieve the same sample complexity but the experimental results show its more effectiveness. All the proposed complexities indicate that our proposed methods can match lower bounds to existing minimax optimizations, without requiring a large batch size in each iteration. Extensive experiments support the efficiency of our proposed methods. △ Less

Submitted 12 December, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2307.15273 [pdf, other]

Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture

Authors: J Bartlett, C E Davey, L A Johnston, J Duan

Abstract: Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-… ▽ More Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal. In this work, we propose a spherical deconvolution network, a model-driven deep learning FOD reconstruction architecture, that ensures intermediate and output FODs produced by the network are consistent with the input DWI signals. Furthermore, we implement a fixel classification penalty within our loss function, encouraging the network to produce FODs that can subsequently be segmented into the correct number of fixels and improve downstream fixel-based analysis. Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net. Moreover, we show that the fixel classification penalty can be tuned to offer improved performance with respect to metrics that rely on accurately segmented of FODs. Our code is publicly available at https://github.com/Jbartlett6/SDNet . △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 10 pages, 7 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2307.11329 [pdf, other]

$C^{k}$ extension and invariant manifolds for the compactification of nonautonomous systems with autonomous limits

Authors: Shuang Chen, **qiao Duan

Abstract: We study the compactification of nonautonomous systems with autonomous limits and related dynamics. Although the $C^{1}$ extension of the compactification was well established, a great number of problems arising in bifurcation and stability analysis require the compactified systems with high-order smoothness. Inspired by this, we give a criterion for the $C^{k}$ ($k\geq 2$) extension of the compac… ▽ More We study the compactification of nonautonomous systems with autonomous limits and related dynamics. Although the $C^{1}$ extension of the compactification was well established, a great number of problems arising in bifurcation and stability analysis require the compactified systems with high-order smoothness. Inspired by this, we give a criterion for the $C^{k}$ ($k\geq 2$) extension of the compactification. After compactifying nonautonomous systems, the compactified systems may gain an additional center direction. We prove the existence and uniqueness of center or center-stable manifolds for general compact invariant sets including normally hyperbolic invariant manifolds and admissible sets. △ Less

Submitted 19 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Welcome comments

arXiv:2307.06272 [pdf, other]

Exposing the Fake: Effective Diffusion-Generated Images Detection

Authors: Ruipeng Ma, **hao Duan, Fei Kong, Xiaoshuang Shi, Kaidi Xu

Abstract: Image synthesis has seen significant advancements with the advent of diffusion-based generative models like Denoising Diffusion Probabilistic Models (DDPM) and text-to-image diffusion models. Despite their efficacy, there is a dearth of research dedicated to detecting diffusion-generated images, which could pose potential security and privacy risks. This paper addresses this gap by proposing a nov… ▽ More Image synthesis has seen significant advancements with the advent of diffusion-based generative models like Denoising Diffusion Probabilistic Models (DDPM) and text-to-image diffusion models. Despite their efficacy, there is a dearth of research dedicated to detecting diffusion-generated images, which could pose potential security and privacy risks. This paper addresses this gap by proposing a novel detection method called Stepwise Error for Diffusion-generated Image Detection (SeDID). Comprising statistical-based $\text{SeDID}_{\text{Stat}}$ and neural network-based $\text{SeDID}_{\text{NNs}}$, SeDID exploits the unique attributes of diffusion models, namely deterministic reverse and deterministic denoising computation errors. Our evaluations demonstrate SeDID's superior performance over existing methods when applied to diffusion models. Thus, our work makes a pivotal contribution to distinguishing diffusion model-generated images, marking a significant step in the domain of artificial intelligence security. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: AdvML-Frontiers@ICML 2023

arXiv:2307.06097 [pdf, other]

Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Networks

Authors: ** Guo, Ting Gao, Yufu Lan, Peng Zhang, Sikun Yang, **qiao Duan

Abstract: Stochastic Gumbel graph networks are proposed to learn high-dimensional time series, where the observed dimensions are often spatially correlated. To that end, the observed randomness and spatial-correlations are captured by learning the drift and diffusion terms of the stochastic differential equation with a Gumble matrix embedding, respectively. In particular, this novel framework enables us to… ▽ More Stochastic Gumbel graph networks are proposed to learn high-dimensional time series, where the observed dimensions are often spatially correlated. To that end, the observed randomness and spatial-correlations are captured by learning the drift and diffusion terms of the stochastic differential equation with a Gumble matrix embedding, respectively. In particular, this novel framework enables us to investigate the implicit regularization effect of the noise terms in S-GGNs. We provide a theoretical guarantee for the proposed S-GGNs by deriving the difference between the two corresponding loss functions in a small neighborhood of weight. Then, we employ Kuramoto's model to generate data for comparing the spectral density from the Hessian Matrix of the two loss functions. Experimental results on real-world data, demonstrate that S-GGNs exhibit superior convergence, robustness, and generalization, compared with state-of-the-arts. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 8 pages, 5 figures

arXiv:2307.05382 [pdf, other]

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

Authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu

Abstract: A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to… ▽ More A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance. △ Less

Submitted 2 July, 2023; originally announced July 2023.

Comments: Accepted in IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023

arXiv:2307.02997 [pdf, other]

Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration

Authors: Xi Jia, Alexander Thorley, Alberto Gomez, Wenqi Lu, Dipak Kotecha, **ming Duan

Abstract: U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a f… ▽ More U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: Under review. arXiv admin note: text overlap with arXiv:2211.16342

arXiv:2307.01379 [pdf, other]

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models

Authors: **hao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Ren**g Xu, Bhavya Kailkhura, Kaidi Xu

Abstract: Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate", making their outputs less reliable. Despite Uncertainty Quantification's (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underl… ▽ More Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate", making their outputs less reliable. Despite Uncertainty Quantification's (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underlying meaning, as "linguistic redundancy" often allows a few keywords to convey the essence of long sentences. However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. We conduct extensive experiments involving a range of popular "off-the-shelf" LLMs, such as Vicuna, WizardLM, and LLaMA-2-chat, with model sizes extending up to 33B parameters. We evaluate various free-form question-answering tasks, encompassing domains such as reading comprehension, science Q&A, and medical Q&A. Our experimental results, coupled with a comprehensive demographic analysis, demonstrate the superior performance of SAR. The code is available at https://github.com/**haoduan/SAR. △ Less

Submitted 28 May, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: To appear in ACL 2024

arXiv:2306.13818 [pdf, other]

AR2-D2:Training a Robot Without a Robot

Authors: Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

Abstract: Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized traini… ▽ More Diligently gathered human demonstrations serve as the unsung heroes empowering the progression of robot learning. Today, demonstrations are collected by training people to use specialized controllers, which (tele-)operate robots to manipulate a small number of objects. By contrast, we introduce AR2-D2: a system for collecting demonstrations which (1) does not require people with specialized training, (2) does not require any real robots during data collection, and therefore, (3) enables manipulation of diverse objects with a real robot. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object while simultaneously capturing essential data modalities for training a real robot. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects. Our experiments further show that training with our AR data is as effective as training with real-world robot demonstrations. Moreover, our user study indicates that users find AR2-D2 intuitive to use and require no training in contrast to four other frequently employed methods for collecting robot demonstrations. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Project website: www.ar2d2.site

arXiv:2306.08902 [pdf, other]

Predictable gate-field control of spin in altermagnets with spin-layer coupling

Authors: Run-Wu Zhang, Chaoxi Cui, Runze Li, **gyi Duan, Lei Li, Zhi-Ming Yu, Yugui Yao

Abstract: Spintronics, a technology harnessing electron spin for information transmission, offers a promising avenue to surpass the limitations of conventional electronic devices. While the spin directly interacts with the magnetic field, its control through the electric field is generally more practical, and has become a focal point in the field of spintronics. Current methodologies for generating spin pol… ▽ More Spintronics, a technology harnessing electron spin for information transmission, offers a promising avenue to surpass the limitations of conventional electronic devices. While the spin directly interacts with the magnetic field, its control through the electric field is generally more practical, and has become a focal point in the field of spintronics. Current methodologies for generating spin polarization via an electric field generally necessitate spin-orbit coupling. Here, we propose an innovative mechanism that accomplishes this task without dependence on spin-orbit coupling. Our method employs two-dimensional altermagnets with valley-mediated spin-layer coupling (SLC), in which electronic states display symmetry-protected and valley-contrasted spin and layer polarization. The SLC facilitates predictable, continuous, and reversible control of spin polarization using a gate electric field. Through symmetry analysis and ab initio calculations, we pinpoint high-quality material candidates that exhibit SLC. We ascertain that applying a gate field of $0.2$ eV/Å~ to monolayer Ca(CoN)$_2$ can induce significant spin splitting up to 123 meV. As a result, perfect and switchable spin/valley-currents, and substantial tunneling magnetoresistance can be achieved in these materials using only a gate field. These findings provide new opportunities for generating predictable spin polarization and designing novel spintronic devices based on coupled spin, valley and layer physics. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.07543 [pdf, other]

How Secure is Your Website? A Comprehensive Investigation on CAPTCHA Providers and Solving Services

Authors: Rui **, Lin Huang, Jikang Duan, Wei Zhao, Yong Liao, Pengyuan Zhou

Abstract: Completely Automated Public Turing Test To Tell Computers and Humans Apart (CAPTCHA) has been implemented on many websites to identify between harmful automated bots and legitimate users. However, the revenue generated by the bots has turned circumventing CAPTCHAs into a lucrative business. Although earlier studies provided information about text-based CAPTCHAs and the associated CAPTCHA-solving s… ▽ More Completely Automated Public Turing Test To Tell Computers and Humans Apart (CAPTCHA) has been implemented on many websites to identify between harmful automated bots and legitimate users. However, the revenue generated by the bots has turned circumventing CAPTCHAs into a lucrative business. Although earlier studies provided information about text-based CAPTCHAs and the associated CAPTCHA-solving services, a lot has changed in the past decade regarding content, suppliers, and solvers of CAPTCHA. We have conducted a comprehensive investigation of the latest third-party CAPTCHA providers and CAPTCHA-solving services' attacks. We dug into the details of CAPTCHA-As-a-Service and the latest CAPTCHA-solving services and carried out adversarial experiments on CAPTCHAs and CAPTCHA solvers. The experiment results show a worrying fact: most latest CAPTCHAs are vulnerable to both human solvers and automated solvers. New CAPTCHAs based on hard AI problems and behavior analysis are needed to stop CAPTCHA solvers. △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2306.04361 [pdf]

Microdisk modulator-assisted optical nonlinear activation functions for photonic neural networks

Authors: Bin Wang, Weizhen Yu, **peng Duan, Shuwen Yang, Zhenyu Zhao, Shuang Zheng, Weifeng Zhang

Abstract: On-chip implementation of optical nonlinear activation functions (NAFs) is essential for realizing large-scale photonic neural chips. To implement different neural processing and machine learning tasks with optimal performances, different NAFs are explored with the use of different devices. From the perspective of on-chip integration and reconfigurability of photonic neural network (PNN), it is hi… ▽ More On-chip implementation of optical nonlinear activation functions (NAFs) is essential for realizing large-scale photonic neural chips. To implement different neural processing and machine learning tasks with optimal performances, different NAFs are explored with the use of different devices. From the perspective of on-chip integration and reconfigurability of photonic neural network (PNN), it is highly preferred that a single compact device can fulfill multiple NAFs. Here, we propose and experimentally demonstrate a compact high-speed microdisk modulator to realize multiple NAFs. The fabricated microdisk modulator has an add-drop configuration in which a lateral PN junction is incorporated for tuning. Based on high-speed nonlinear electrical-optical (E-O) effect, multiple NAFs are realized by electrically controlling free-carrier injection. Thanks to its strong optical confinement of the disk cavity, all-optical thermo-optic (TO) nonlinear effect can also be leveraged to realize other four different NAFs, which is difficult to be realized with the use of electrical-optical effect. With the use of the realized nonlinear activation function, a convolutional neural network (CNN) is studied to perform handwritten digit classification task, and an accuracy as large as 98% is demonstrated, which verifies the effectiveness of the use of the high-speed microdisk modulator to realize the NAFs. Thanks to its compact footprint and strong electrical-optical or all-optical effects, the microdisk modulator features multiple NAFs, which could serve as a flexible nonlinear unit for large-scale PNNs. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.02267 [pdf, other]

Proteus: Simulating the Performance of Distributed DNN Training

Authors: Jiangfei Duan, Xiuhong Li, ** Xu, Xingcheng Zhang, Shengen Yan, Yun Liang, Dahua Lin

Abstract: DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN… ▽ More DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN training. However, complex parallelization strategies and the resulting complex runtime behaviors make it challenging to construct an accurate performance model. In this paper, we present Proteus, the first standalone simulator to model the performance of complex parallelization strategies through simulation execution. Proteus first models complex parallelization strategies with a unified representation named Strategy Tree. Then, it compiles the strategy tree into a distributed execution graph and simulates the complex runtime behaviors, comp-comm overlap and bandwidth sharing, with a Hierarchical Topo-Aware Executor (HTAE). We finally evaluate Proteus across a wide variety of DNNs on three hardware configurations. Experimental results show that Proteus achieves $3.0\%$ average prediction error and preserves order for training throughput of various parallelization strategies. Compared to state-of-the-art approaches, Proteus reduces prediction error by up to $133.8\%$. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2306.02064 [pdf, other]

Flew Over Learning Trap: Learn Unlearnable Samples by Progressive Staged Training

Authors: Pucheng Dang, Xing Hu, Kaidi Xu, **hao Duan, Di Huang, Husheng Han, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

Abstract: Unlearning techniques are proposed to prevent third parties from exploiting unauthorized data, which generate unlearnable samples by adding imperceptible perturbations to data for public publishing. These unlearnable samples effectively misguide model training to learn perturbation features but ignore image semantic features. We make the in-depth analysis and observe that models can learn both ima… ▽ More Unlearning techniques are proposed to prevent third parties from exploiting unauthorized data, which generate unlearnable samples by adding imperceptible perturbations to data for public publishing. These unlearnable samples effectively misguide model training to learn perturbation features but ignore image semantic features. We make the in-depth analysis and observe that models can learn both image features and perturbation features of unlearnable samples at an early stage, but rapidly go to the overfitting stage since the shallow layers tend to overfit on perturbation features and make models fall into overfitting quickly. Based on the observations, we propose Progressive Staged Training to effectively prevent models from overfitting in learning perturbation features. We evaluated our method on multiple model architectures over diverse datasets, e.g., CIFAR-10, CIFAR-100, and ImageNet-mini. Our method circumvents the unlearnability of all state-of-the-art methods in the literature and provides a reliable baseline for further evaluation of unlearnable techniques. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2306.01902 [pdf, other]

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

Authors: Zhengyue Zhao, **hao Duan, Xing Hu, Kaidi Xu, Chenan Wang, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen

Abstract: Diffusion models have demonstrated remarkable performance in image generation tasks, paving the way for powerful AIGC applications. However, these widely-used generative models can also raise security and privacy concerns, such as copyright infringement, and sensitive data leakage. To tackle these issues, we propose a method, Unlearnable Diffusion Perturbation, to safeguard images from unauthorize… ▽ More Diffusion models have demonstrated remarkable performance in image generation tasks, paving the way for powerful AIGC applications. However, these widely-used generative models can also raise security and privacy concerns, such as copyright infringement, and sensitive data leakage. To tackle these issues, we propose a method, Unlearnable Diffusion Perturbation, to safeguard images from unauthorized exploitation. Our approach involves designing an algorithm to generate sample-wise perturbation noise for each image to be protected. This imperceptible protective noise makes the data almost unlearnable for diffusion models, i.e., diffusion models trained or fine-tuned on the protected data cannot generate high-quality and diverse images related to the protected training data. Theoretically, we frame this as a max-min optimization problem and introduce EUDP, a noise scheduler-based method to enhance the effectiveness of the protective noise. We evaluate our methods on both Denoising Diffusion Probabilistic Model and Latent Diffusion Models, demonstrating that training diffusion models on the protected data lead to a significant reduction in the quality of the generated images. Especially, the experimental results on Stable Diffusion demonstrate that our method effectively safeguards images from being used to train Diffusion Models in various tasks, such as training specific objects and styles. This achievement holds significant importance in real-world scenarios, as it contributes to the protection of privacy and copyright against AI-generated content. △ Less

Submitted 24 June, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

Showing 51–100 of 585 results for author: Duan, J