Search | arXiv e-print repository

Primitive-based 3D Human-Object Interaction Modelling and Programming

Authors: Siqi Liu, Yong-Lu Li, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu

Abstract: Embedding Human and Articulated Object Interaction (HAOI) in 3D is an important direction for a deeper human activity understanding. Different from previous works that use parametric and CAD models to represent humans and objects, in this work, we propose a novel 3D geometric primitive-based language to encode both humans and objects. Given our new paradigm, humans and objects are all compositions… ▽ More Embedding Human and Articulated Object Interaction (HAOI) in 3D is an important direction for a deeper human activity understanding. Different from previous works that use parametric and CAD models to represent humans and objects, in this work, we propose a novel 3D geometric primitive-based language to encode both humans and objects. Given our new paradigm, humans and objects are all compositions of primitives instead of heterogeneous entities. Thus, mutual information learning may be achieved between the limited 3D data of humans and different object categories. Moreover, considering the simplicity of the expression and the richness of the information it contains, we choose the superquadric as the primitive representation. To explore an effective embedding of HAOI for the machine, we build a new benchmark on 3D HAOI consisting of primitives together with their images and propose a task requiring machines to recover 3D HAOI using primitives from images. Moreover, we propose a baseline of single-view 3D reconstruction on HAOI. We believe this primitive-based 3D HAOI representation would pave the way for 3D HAOI studies. Our code and data are available at https://mvig-rhos.com/p3haoi. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: AAAI2024

arXiv:2312.09911 [pdf, other]

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Authors: Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Haorui He, Chaoren Wang, Xi Chen, Zihao Fang, Haopeng Chen, Junan Zhang, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

Abstract: Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, a… ▽ More Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that is inclusive of diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. Additionally, it provides interactive visualizations and demonstrations of classic models for educational purposes. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. △ Less

Submitted 22 February, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Amphion Website: https://github.com/open-mmlab/Amphion

arXiv:2312.09085 [pdf, other]

The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation

Authors: Rongwu Xu, Brian S. Lin, Shujian Yang, Tianqi Zhang, Weiyan Shi, Tianwei Zhang, Zhixuan Fang, Wei Xu, Han Qiu

Abstract: Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particula… ▽ More Large language models (LLMs) encapsulate vast amounts of knowledge but still remain vulnerable to external misinformation. Existing research mainly studied this susceptibility behavior in a single-turn setting. However, belief can change during a multi-turn conversation, especially a persuasive one. Therefore, in this study, we delve into LLMs' susceptibility to persuasive conversations, particularly on factual questions that they can answer correctly. We first curate the Farm (i.e., Fact to Misinform) dataset, which contains factual questions paired with systematically generated persuasive misinformation. Then, we develop a testing framework to track LLMs' belief changes in a persuasive dialogue. Through extensive experiments, we find that LLMs' correct beliefs on factual knowledge can be easily manipulated by various persuasive strategies. △ Less

Submitted 31 May, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted to ACL'24 (Main). Camera-ready version

arXiv:2312.08729 [pdf, other]

doi 10.1088/0256-307X/40/12/127101

VASP2KP: kp models and Lande g-factors from ab initio calculations

Authors: Sheng Zhang, Haohao Sheng, Zhi-Da Song, Chenhao Liang, Yi Jiang, Song Sun, Quansheng Wu, Hongming Weng, Zhong Fang, Xi Dai, Zhijun Wang

Abstract: The $k\cdot p$ method is significant in condensed matter physics for the compact and analytical Hamiltonian. In the presence of magnetic field, it is described by the effective Zeeman's coupling Hamiltonian with Landé $ g $-factors. Here, we develop an open-source package VASP2KP (including two parts: vasp2mat and mat2kp) to compute $k\cdot p$ parameters and Landé $g$-factors directly from the wav… ▽ More The $k\cdot p$ method is significant in condensed matter physics for the compact and analytical Hamiltonian. In the presence of magnetic field, it is described by the effective Zeeman's coupling Hamiltonian with Landé $ g $-factors. Here, we develop an open-source package VASP2KP (including two parts: vasp2mat and mat2kp) to compute $k\cdot p$ parameters and Landé $g$-factors directly from the wavefunctions provided by the density functional theory (DFT) as implemented in Vienna ab initio Simulation Package (VASP). First, we develop a VASP patch vasp2mat to compute matrix representations of the generalized momentum operator $ \mathbf{\hatπ}=\mathbf{\hat{p}}+\frac{1}{2mc^2}\left(\mathbf{\hat{s}}\times\nabla V(\mathbf{r})\right) $, spin operator $\mathbf{\hat{s}}$, time reversal operator $\hat{T}$ and crystalline symmetry operators $\hat{R}$ on the DFT wavefunctions. Second, we develop a python code mat2kp to obtain the unitary transformation $U$ that rotates the degenerate DFT basis towards the standard basis, and then automatically compute the $k\cdot p$ parameters and $g$-factors. The theory and the methodology behind VASP2KP are described in detail. The matrix elements of the operators are derived comprehensively and computed correctly within the projector augmented wave method. We apply this package to some materials, e.g., Bi$_2$Se$_3$, Na$_3$Bi, Te, InAs and 1H-TMD monolayers. The obtained effective model's dispersions are in good agreement with the DFT data around the specific wave vector, and the $g$-factors are consistent with experimental data. The VASP2KP package is available at https://github.com/zjwang11/VASP2KP. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Journal ref: Chin. Phys. Lett. 40, 127101 (2023)

arXiv:2312.08682 [pdf, other]

High-coherence parallelization in integrated photonics

Authors: Xuguang Zhang, Zixuan Zhou, Yijun Guo, Minxue Zhuang, Warren **, Bitao Shen, Yujun Chen, Jiahui Huang, Zihan Tao, Ming **, Ruixuan Chen, Zhangfeng Ge, Zhou Fang, Ning Zhang, Yadong Liu, Pengfei Cai, Weiwei Hu, Haowen Shu, Dong Pan, John E. Bowers, Xingjun Wang, Lin Chang

Abstract: Coherent optics has profoundly impacted diverse applications ranging from communications, LiDAR to quantum computations. However, building coherent systems in integrated photonics previously came at great expense in hardware integration and energy efficiency: the lack of a power-efficient way to generate highly coherent light necessitates bulky lasers and amplifiers, while frequency and phase reco… ▽ More Coherent optics has profoundly impacted diverse applications ranging from communications, LiDAR to quantum computations. However, building coherent systems in integrated photonics previously came at great expense in hardware integration and energy efficiency: the lack of a power-efficient way to generate highly coherent light necessitates bulky lasers and amplifiers, while frequency and phase recovery schemes require huge digital signal processing resources. In this work, we demonstrate a high-coherence parallelization strategy that facilitates advanced integrated coherent systems at a minimum price. Using a self-injection locked microcomb to injection lock a distributed feedback laser array, we boost the microcomb power by a record high gain of up to 60 dB on chip with no degradation in coherence. This strategy enables tens of highly coherent channels with an intrinsic linewidth down to the 10 Hz level and power of more than 20 dBm. The overall electrical to optical wall-plug efficiency reaches 19%, comparable with that of the state-of-the-art semiconductor lasers. Driven by this parallel source, we demonstrate a silicon photonic communication link with an unprecedented data rate beyond 60 Tbit/s. Importantly, the high coherence we achieve reduces the coherent-related DSP consumption by 99.999% compared with the traditional III-V laser pump scheme. This work paves a way to realizing scalable, high-performance coherent integrated photonic systems, potentially benefiting numerous applications. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.03703 [pdf, other]

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Authors: Xinshun Wang, Zhongbin Fang, Xia Li, Xiangtai Li, Mengyuan Liu

Abstract: In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, Skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton… ▽ More In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, Skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the inter-frame and cross-task pose similarity that makes it outstandingly hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks. △ Less

Submitted 2 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Project page: https://github.com/fanglaosi/Skeleton-in-Context

arXiv:2312.01346 [pdf, other]

A holographic study on QCD phase transition and phase diagram with two flavors

Authors: Xin-Yi Liu, Xiao-Chang Peng, Yue-Liang Wu, Zhen Fang

Abstract: We investigate the chemical potential effects of the equation of state and the chiral transition in an Einstein-Maxwell-dilaton-scalar system, which is obtained from an improved soft-wall AdS/QCD model coupled with an Einstein-Maxwell-dilaton system. The equations of state obtained from the model are in quantitative agreement with the lattice results at both zero and nonzero chemical potentials. T… ▽ More We investigate the chemical potential effects of the equation of state and the chiral transition in an Einstein-Maxwell-dilaton-scalar system, which is obtained from an improved soft-wall AdS/QCD model coupled with an Einstein-Maxwell-dilaton system. The equations of state obtained from the model are in quantitative agreement with the lattice results at both zero and nonzero chemical potentials. The sensible chiral transition behaviors can be realized in the model. The QCD phase diagram with a CEP has also been obtained from the model. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.17267 [pdf, other]

E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer

Authors: Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu

Abstract: To build scalable models for challenging real-world tasks, it is important to learn from diverse, multi-modal data in various forms (e.g., videos, text, and images). Among the existing works, a plethora of them have focused on leveraging large but cumbersome cross-modal architectures. Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real… ▽ More To build scalable models for challenging real-world tasks, it is important to learn from diverse, multi-modal data in various forms (e.g., videos, text, and images). Among the existing works, a plethora of them have focused on leveraging large but cumbersome cross-modal architectures. Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real-world applications, so building a lightweight VL architecture and an efficient learning schema is of great practical value. In this paper, we propose an Efficient Video-Language Model (dubbed as E-ViLM) and a masked video modeling (MVM) schema, assisted with a semantic vector-quantized tokenizer. In particular, our E-ViLM learns to reconstruct the semantic labels of masked video regions, produced by the pre-trained vector-quantized tokenizer, which discretizes the continuous visual signals into labels. We show that with our simple MVM task and regular VL pre-training modelings, our E-ViLM, despite its compactness, is able to learn expressive representations from Video-Language corpus and generalize well to extensive Video-Language tasks including video question answering, text-to-video retrieval, etc. In particular, our E-ViLM obtains obvious efficiency improvements by reaching competing performances with faster inference speed, i.e., our model reaches $39.3$% Top-$1$ accuracy on the MSRVTT benchmark, retaining $91.4$% of the accuracy of state-of-the-art larger VL architecture with only $15%$ parameters and $94.8%$ fewer GFLOPs. We also provide extensive ablative studies that validate the effectiveness of our proposed learning schema for E-ViLM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.16754 [pdf, other]

Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving

Authors: Senkang Hu, Zhengru Fang, Xianhao Chen, Yuguang Fang, Sam Kwong

Abstract: Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challeng… ▽ More Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challenges, we propose a unified domain generalization framework applicable in both training and inference stages of collaborative perception. In the training phase, we introduce an Amplitude Augmentation (AmpAug) method to augment low-frequency image variations, broadening the model's ability to learn across various domains. We also employ a meta-consistency training scheme to simulate domain shifts, optimizing the model with a carefully designed consistency loss to encourage domain-invariant representations. In the inference phase, we introduce an intra-system domain alignment mechanism to reduce or potentially eliminate the domain discrepancy among CAVs prior to inference. Comprehensive experiments substantiate the effectiveness of our method in comparison with the existing state-of-the-art works. Code will be released at https://github.com/DG-CAVs/DG-CoPerception.git. △ Less

Submitted 1 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.15315 [pdf]

Integrated electro-optically tunable narrow-linewidth III-V laser

Authors: Yiran Zhu, Shupeng Yu, Zhiwei Fang, Difeng Yin, Jian Liu, Zhe Wang, Yuan Zhou, Yu Ma, Haisu Zhang, Min Wang, Ya Cheng

Abstract: We demonstrate an integrated electro-optically tunable narrow-linewidth III-V laser with an output power of 738.8 μW and an intrinsic linewidth of 45.55 kHz at the C band. The laser cavity is constructed using a fiber Bragg grating (FBG) and a tunable Sagnac loop reflector (TSLR) fabricated on thin film lithium niobate (TFLN). The combination of the FBG and the electro-optically tunable TSLR offer… ▽ More We demonstrate an integrated electro-optically tunable narrow-linewidth III-V laser with an output power of 738.8 μW and an intrinsic linewidth of 45.55 kHz at the C band. The laser cavity is constructed using a fiber Bragg grating (FBG) and a tunable Sagnac loop reflector (TSLR) fabricated on thin film lithium niobate (TFLN). The combination of the FBG and the electro-optically tunable TSLR offers the advantages of single spatial mode, single-frequency, narrow-linewidth, and wide wavelength tunability for the electrically pumped hybrid integrated laser, which features a frequency tuning range of 20 GHz and a tuning efficiency of 0.8 GHz/V. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.12299 [pdf]

Thin Film Lithium Niobate Electro-optic Isolator Fabricated by photolithography assisted chemo-mechanical etching (PLACE)

Authors: Lang Gao, Youting Liang, Lvbin Song, Difeng Yin, Jia Qi, **ming Chen, Zhaoxiang Liu, Jian** Yu, Jian Liu, Haisu Zhang, Zhiwei Fang, Hongxin Qi, Ya Cheng

Abstract: We report a thin-film lithium niobate electro-optic isolator fabricated by photolithography-assisted chemo-mechanical etching in this work. The device demonstrates 39.50 dB isolation when subjected to a 24 GHz microwave of 25.5 dBm on its electrodes. The measured isolation remains consistently above 30 dB within the 1510 nm to 1600 nm wavelength range. The overall device insertion loss, specifical… ▽ More We report a thin-film lithium niobate electro-optic isolator fabricated by photolithography-assisted chemo-mechanical etching in this work. The device demonstrates 39.50 dB isolation when subjected to a 24 GHz microwave of 25.5 dBm on its electrodes. The measured isolation remains consistently above 30 dB within the 1510 nm to 1600 nm wavelength range. The overall device insertion loss, specifically the fiber-to-fiber insert loss, has been measured to be 2.6 dB, which is attributed to our highly efficient spot size converter and the low propagation loss observed in the fabricated waveguides. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.10554 [pdf, other]

doi 10.7498/aps.70.20210018

Simulation method of urban evacuation based on mesoscopic cellular automata

Authors: Wei Lv, **ghui Wang, Zhiming Fang, Dun Mao

Abstract: This study integrates pedestrian flow characteristics to formulate a mesoscopic cellular automata model tailored for simulating evacuations in large-scale scenarios. Departing from the conventional planar grid cell division, the model employs road cell segmentation, thereby physically enlarging the dimensions of individual cells. This augmentation accommodates an increased occupancy of individuals… ▽ More This study integrates pedestrian flow characteristics to formulate a mesoscopic cellular automata model tailored for simulating evacuations in large-scale scenarios. Departing from the conventional planar grid cell division, the model employs road cell segmentation, thereby physically enlarging the dimensions of individual cells. This augmentation accommodates an increased occupancy of individuals per cell, representing pedestrian flow parameters within each cell through state variables. The source loading cell facilitates the simulation of pedestrian behavior transitioning from buildings to roads during an actual evacuation event, while the unloading cell situated at the exit removes evacuees from the system. The continuity equation for state transitions comprehensively encapsulates the dynamics of pedestrians throughout the evacuation process. Potential challenges in actual evacuation processes are identified through the simulation, offering valuable insights for improvement. This research aims to contribute to a more effective and informed approach to evacuation planning and management. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 13 pages, 14figures

Journal ref: Acta Physica Sinica, 70(10), 76-84.[In Chinese] (2021)

arXiv:2311.06056 [pdf, other]

doi 10.1109/TCSVT.2024.3370731

Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples

Authors: Ziye Fang, Xin Jiang, Hao Tang, Zechao Li

Abstract: In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories. However, this task is inherently challenging due to the complex granularity of category subdivisions and the limited availability of data for each category. To address these challenges, this work proposes CSDN… ▽ More In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories. However, this task is inherently challenging due to the complex granularity of category subdivisions and the limited availability of data for each category. To address these challenges, this work proposes CSDNet, a pioneering framework that effectively explores contrastive learning and self-distillation to learn discriminative representations specifically designed for Ultra-FGVC tasks. CSDNet comprises three main modules: Subcategory-Specific Discrepancy Parsing (SSDP), Dynamic Discrepancy Learning (DDL), and Subcategory-Specific Discrepancy Transfer (SSDT), which collectively enhance the generalization of deep models across instance, feature, and logit prediction levels. To increase the diversity of training samples, the SSDP module introduces adaptive augmented samples to spotlight subcategory-specific discrepancies. Simultaneously, the proposed DDL module stores historical intermediate features by a dynamic memory queue, which optimizes the feature learning space through iterative contrastive learning. Furthermore, the SSDT module effectively distills subcategory-specific discrepancies knowledge from the inherent structure of limited training data using a self-distillation paradigm at the logit prediction level. Experimental results demonstrate that CSDNet outperforms current state-of-the-art Ultra-FGVC methods, emphasizing its powerful efficacy and adaptability in addressing Ultra-FGVC tasks. △ Less

Submitted 25 February, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Accepted for Publication in TCSVT

arXiv:2311.04827 [pdf, other]

doi 10.1016/j.trc.2023.104400

Exploring crowd persistent dynamism from pedestrian crossing perspective: An empirical study

Authors: **ghui Wang, Wei Lv, Huihua Jiang, Zhiming Fang, Jian Ma

Abstract: Crowd studies have gained increasing relevance due to the recurring incidents of crowd crush accidents. In addressing the issue of the crowd's persistent dynamism, this paper explored the macroscopic and microscopic features of pedestrians crossing in static and dynamic contexts, employing a series of systematic experiments. Firstly, empirical evidence has confirmed the existence of crowd's persis… ▽ More Crowd studies have gained increasing relevance due to the recurring incidents of crowd crush accidents. In addressing the issue of the crowd's persistent dynamism, this paper explored the macroscopic and microscopic features of pedestrians crossing in static and dynamic contexts, employing a series of systematic experiments. Firstly, empirical evidence has confirmed the existence of crowd's persistent dynamism. Subsequently, the research delves into two aspects, qualitative and quantitative, to address the following questions:(1) Cross pedestrians tend to avoid high-density areas when crossing static crowds and particularly evade pedestrians in front to avoid deceleration, thus inducing the formation of cross-channels, a self-organization phenomenon.(2) In dynamic crowds, when pedestrian suffers spatial constrained, two patterns emerge: decelerate or detour. Research results indicate the differences in pedestrian crossing behaviors between static and dynamic crowds, such as the formation of crossing channels, backward detours, and spiral turning. However, the strategy of pedestrian crossing remains consistent: utilizing detours to overcome spatial constraints. Finally, the empirical results of this study address the final question: pedestrians detouring causes crowds' persistent collective dynamism. These findings contribute to an enhanced understanding of pedestrian dynamics in extreme conditions and provide empirical support for research on individual movement patterns and crowd behavior prediction. △ Less

Submitted 26 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

Comments: 31pages, 17figures

Journal ref: Transportation Research Part C: Emerging Technologies, Volume 157, 2023, 104400

arXiv:2311.03236 [pdf, other]

Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources

Authors: Haotian Zheng, Qizhou Wang, Zhen Fang, Xiaobo Xia, Feng Liu, Tongliang Liu, Bo Han

Abstract: Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learni… ▽ More Out-of-distribution (OOD) detection discerns OOD data where the predictor cannot make valid predictions as in-distribution (ID) data, thereby increasing the reliability of open-world classification. However, it is typically hard to collect real out-of-distribution (OOD) data for training a predictor capable of discerning ID and OOD patterns. This obstacle gives rise to data generation-based learning methods, synthesizing OOD data via data generators for predictor training without requiring any real OOD data. Related methods typically pre-train a generator on ID data and adopt various selection procedures to find those data likely to be the OOD cases. However, generated data may still coincide with ID semantics, i.e., mistaken OOD generation remains, confusing the predictor between ID and OOD data. To this end, we suggest that generated data (with mistaken OOD generation) can be used to devise an auxiliary OOD detection task to facilitate real OOD detection. Specifically, we can ensure that learning from such an auxiliary task is beneficial if the ID and the OOD parts have disjoint supports, with the help of a well-designed training procedure for the predictor. Accordingly, we propose a powerful data generation-based learning method named Auxiliary Task-based OOD Learning (ATOL) that can relieve the mistaken OOD generation. We conduct extensive experiments under various OOD detection setups, demonstrating the effectiveness of our method against its advanced counterparts. △ Less

Submitted 5 December, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2311.01796 [pdf, other]

Learning to Augment Distributions for Out-of-Distribution Detection

Authors: Qizhou Wang, Zhen Fang, Yonggang Zhang, Feng Liu, Yixuan Li, Bo Han

Abstract: Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lack of knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen… ▽ More Open-world classification systems should discern out-of-distribution (OOD) data whose labels deviate from those of in-distribution (ID) cases, motivating recent studies in OOD detection. Advanced works, despite their promising progress, may still fail in the open world, owing to the lack of knowledge about unseen OOD data in advance. Although one can access auxiliary OOD data (distinct from unseen ones) for model training, it remains to analyze how such auxiliary data will work in the open world. To this end, we delve into such a problem from a learning theory perspective, finding that the distribution discrepancy between the auxiliary and the unseen real OOD data is the key to affecting the open-world detection performance. Accordingly, we propose Distributional-Augmented OOD Learning (DAL), alleviating the OOD distribution discrepancy by crafting an OOD distribution set that contains all distributions in a Wasserstein ball centered on the auxiliary OOD distribution. We justify that the predictor trained over the worst OOD data in the ball can shrink the OOD distribution discrepancy, thus improving the open-world detection performance given only the auxiliary OOD data. We conduct extensive evaluations across representative OOD detection setups, demonstrating the superiority of our DAL over its advanced counterparts. △ Less

Submitted 25 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.01483 [pdf, other]

FedSN: A Novel Federated Learning Framework over LEO Satellite Networks

Authors: Zheng Lin, Zhe Chen, Zihan Fang, Xianhao Chen, Xiong Wang, Yue Gao

Abstract: Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space by commercial companies, such as SpaceX. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communication but also for various machine learning applications, such as space modulation recognition, remote sensing image classification, etc. However, the gro… ▽ More Recently, a large number of Low Earth Orbit (LEO) satellites have been launched and deployed successfully in space by commercial companies, such as SpaceX. Due to multimodal sensors equipped by the LEO satellites, they serve not only for communication but also for various machine learning applications, such as space modulation recognition, remote sensing image classification, etc. However, the ground station (GS) may be incapable of downloading such a large volume of raw sensing data for centralized model training due to the limited contact time with LEO satellites (e.g. 5 minutes). Therefore, federated learning (FL) has emerged as the promising solution to address this problem via on-device training. Unfortunately, to enable FL on LEO satellites, we still face three critical challenges that are i) heterogeneous computing and memory capabilities, ii) limited uplink rate, and iii) model staleness. To this end, we propose FedSN as a general FL framework to tackle the above challenges, and fully explore data diversity on LEO satellites. Specifically, we first present a novel sub-structure scheme to enable heterogeneous local model training considering different computing, memory, and communication constraints on LEO satellites. Additionally, we propose a pseudo-synchronous model aggregation strategy to dynamically schedule model aggregation for compensating model staleness. To further demonstrate the effectiveness of the FedSN, we evaluate it using space modulation recognition and remote sensing image classification tasks by leveraging the data from real-world satellite networks. Extensive experimental results demonstrate that FedSN framework achieves higher accuracy, lower computing, and communication overhead than the state-of-the-art benchmarks and the effectiveness of each components in FedSN. △ Less

Submitted 2 April, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

Comments: 14 pages, 17 figures

arXiv:2311.00836 [pdf, ps, other]

Effective filtering approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization

Authors: Zhou Fang, Ankit Gupta, Mustafa Khammash

Abstract: Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemmi… ▽ More Stochastic filtering is a vibrant area of research in both control theory and statistics, with broad applications in many scientific fields. Despite its extensive historical development, there still lacks an effective method for joint parameter-state estimation in SDEs. The state-of-the-art particle filtering methods suffer from either sample degeneracy or information loss, with both issues stemming from the dynamics of the particles generated to represent system parameters. This paper provides a novel and effective approach for joint parameter-state estimation in SDEs via Rao-Blackwellization and modularization. Our method operates in two layers: the first layer estimates the system states using a bootstrap particle filter, and the second layer marginalizes out system parameters explicitly. This strategy circumvents the need to generate particles representing system parameters, thereby mitigating their associated problems of sample degeneracy and information loss. Moreover, our method employs a modularization approach when integrating out the parameters, which significantly reduces the computational complexity. All these designs ensure the superior performance of our method. Finally, a numerical example is presented to illustrate that our method outperforms existing approaches by a large margin. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 8 pages, 2 figures

MSC Class: 62M20; 62F15; 65C05; 92-08; 93E11

arXiv:2310.18764 [pdf]

Atomistic Processes of high-temperature plastic deformation of nanoscale body-centered cubic tungsten

Authors: Sixue Zheng, Zhengwu Fang, Scott X. Mao

Abstract: Much scientific and practical interest is currently focused on the atomic-scale mechanical behaviors of metallic nanocrystals with different crystal structures at room temperature, while the high-temperature plastic deformation in tungsten nanocrystals remains not well understood, due to the technical difficulty in elevating the experimental temperature during in situ mechanical tests in an extrem… ▽ More Much scientific and practical interest is currently focused on the atomic-scale mechanical behaviors of metallic nanocrystals with different crystal structures at room temperature, while the high-temperature plastic deformation in tungsten nanocrystals remains not well understood, due to the technical difficulty in elevating the experimental temperature during in situ mechanical tests in an extremely small chamber of transmission electron microscopes. In this study, a in situ high-temperature nanomechanical testing method is developed based on electrical-current-induced Joule heating in the metallic nanocrystal. By this method, it is found that three distinct deformation modes, that is deformation twinning, body-centered-cubic-face-centered-cubic-body-centered-cubic phase transformation and perfect dislocation slip, are sequentially activated in the tungsten nanocrystal during high-temperature tensile test. Such ductile behavior is related to not only the experimental temperature and but also the loading orientation. These findings shed light on the atomic-scale plastic deformation in body-centered cubic metals at elevated temperature. △ Less

Submitted 8 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

Comments: Modify the figure captions. Reduce the file size

arXiv:2310.16655 [pdf, other]

Towards Control-Centric Representations in Reinforcement Learning from Images

Authors: Chen Liu, Hongyu Zang, Xin Li, Yong Heng, Yifei Wang, Zhen Fang, Yisen Wang, Mingzhong Wang

Abstract: Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the i… ▽ More Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness. △ Less

Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.16123 [pdf, other]

Anchor Space Optimal Transport: Accelerating Batch Processing of Multiple OT Problems

Authors: Jianming Huang, Xun Su, Zhongxi Fang, Hiroyuki Kasai

Abstract: The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the comp… ▽ More The optimal transport (OT) theory provides an effective way to compare probability distributions on a defined metric space, but it suffers from cubic computational complexity. Although the Sinkhorn's algorithm greatly reduces the computational complexity of OT solutions, the solutions of multiple OT problems are still time-consuming and memory-comsuming in practice. However, many works on the computational acceleration of OT are usually based on the premise of a single OT problem, ignoring the potential common characteristics of the distributions in a mini-batch. Therefore, we propose a translated OT problem designated as the anchor space optimal transport (ASOT) problem, which is specially designed for batch processing of multiple OT problem solutions. For the proposed ASOT problem, the distributions will be mapped into a shared anchor point space, which learns the potential common characteristics and thus help accelerate OT batch processing. Based on the proposed ASOT, the Wasserstein distance error to the original OT problem is proven to be bounded by ground cost errors. Building upon this, we propose three methods to learn an anchor space minimizing the distance error, each of which has its application background. Numerical experiments on real-world datasets show that our proposed methods can greatly reduce computational time while maintaining reasonable approximation performance. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: 26 pages, 4 figures, 6 tables

arXiv:2310.14541 [pdf, other]

Continual Named Entity Recognition without Catastrophic Forgetting

Authors: Duzhen Zhang, Wei Cong, Jiahua Dong, Yahan Yu, Xiuyi Chen, Yonggang Zhang, Zhen Fang

Abstract: Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading… ▽ More Continual Named Entity Recognition (CNER) is a burgeoning area, which involves updating an existing model by incorporating new entity types sequentially. Nevertheless, continual learning approaches are often severely afflicted by catastrophic forgetting. This issue is intensified in CNER due to the consolidation of old entity types from previous steps into the non-entity type at each step, leading to what is known as the semantic shift problem of the non-entity type. In this paper, we introduce a pooled feature distillation loss that skillfully navigates the trade-off between retaining knowledge of old entity types and acquiring new ones, thereby more effectively mitigating the problem of catastrophic forgetting. Additionally, we develop a confidence-based pseudo-labeling for the non-entity type, \emph{i.e.,} predicting entity types using the old model to handle the semantic shift of the non-entity type. Following the pseudo-labeling process, we suggest an adaptive re-weighting type-balanced learning strategy to handle the issue of biased type distribution. We carried out comprehensive experiments on ten CNER settings using three different datasets. The results illustrate that our method significantly outperforms prior state-of-the-art approaches, registering an average improvement of $6.3$\% and $8.0$\% in Micro and Macro F1 scores, respectively. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP2023 main conference as a long paper

arXiv:2310.14344 [pdf, other]

What's in a Prior? Learned Proximal Networks for Inverse Problems

Authors: Zhenghan Fang, Sam Buchanan, Jeremias Sulam

Abstract: Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely d… ▽ More Proximal operators are ubiquitous in inverse problems, commonly appearing as part of algorithmic strategies to regularize problems that are otherwise ill-posed. Modern deep learning models have been brought to bear for these tasks too, as in the framework of plug-and-play or deep unrolling, where they loosely resemble proximal operators. Yet, something essential is lost in employing these purely data-driven approaches: there is no guarantee that a general deep network represents the proximal operator of any function, nor is there any characterization of the function for which the network might provide some approximate proximal. This not only makes guaranteeing convergence of iterative schemes challenging but, more fundamentally, complicates the analysis of what has been learned by these networks about their training data. Herein we provide a framework to develop learned proximal networks (LPN), prove that they provide exact proximal operators for a data-driven nonconvex regularizer, and show how a new training strategy, dubbed proximal matching, provably promotes the recovery of the log-prior of the true data distribution. Such LPN provide general, unsupervised, expressive proximal operators that can be used for general inverse problems with convergence guarantees. We illustrate our results in a series of cases of increasing complexity, demonstrating that these models not only result in state-of-the-art performance, but provide a window into the resulting priors learned from data. △ Less

Submitted 27 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.11160 [pdf, other]

Leveraging Diverse Semantic-based Audio Pretrained Models for Singing Voice Conversion

Authors: Xueyao Zhang, Yicheng Gu, Haopeng Chen, Zihao Fang, Lexiao Zou, Junan Zhang, Liumeng Xue, **chao Zhang, Jie Zhou, Zhizheng Wu

Abstract: Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC req… ▽ More Singing Voice Conversion (SVC) is a technique that enables any singer to perform any song. To achieve this, it is essential to obtain speaker-agnostic representations from the source audio, which poses a significant challenge. A common solution involves utilizing a semantic-based audio pretrained model as a feature extractor. However, the degree to which the extracted features can meet the SVC requirements remains an open question. This includes their capability to accurately model melody and lyrics, the speaker-independency of their underlying acoustic information, and their robustness for in-the-wild acoustic environments. In this study, we investigate the knowledge within classical semantic-based pretrained models in much detail. We discover that the knowledge of different models is diverse and can be complementary for SVC. To jointly utilize the diverse pretrained models with mismatched time resolutions, we propose an efficient ReTrans strategy to address the feature fusion problem. Based on the above, we design a Singing Voice Conversion framework based on Diverse Semantic-based Feature Fusion (DSFF-SVC). Experimental results demonstrate that DSFF-SVC can be generalized and improve various existing SVC models, particularly in challenging real-world conversion tasks. △ Less

Submitted 27 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.11093 [pdf, other]

SODA: Robust Training of Test-Time Data Adaptors

Authors: Zige Wang, Yonggang Zhang, Zhen Fang, Long Lan, Wen**g Yang, Bo Han

Abstract: Adapting models deployed to test distributions can mitigate the performance degradation caused by distribution shifts. However, privacy concerns may render model parameters inaccessible. One promising approach involves utilizing zeroth-order optimization (ZOO) to train a data adaptor to adapt the test data to fit the deployed models. Nevertheless, the data adaptor trained with ZOO typically brings… ▽ More Adapting models deployed to test distributions can mitigate the performance degradation caused by distribution shifts. However, privacy concerns may render model parameters inaccessible. One promising approach involves utilizing zeroth-order optimization (ZOO) to train a data adaptor to adapt the test data to fit the deployed models. Nevertheless, the data adaptor trained with ZOO typically brings restricted improvements due to the potential corruption of data features caused by the data adaptor. To address this issue, we revisit ZOO in the context of test-time data adaptation. We find that the issue directly stems from the unreliable estimation of the gradients used to optimize the data adaptor, which is inherently due to the unreliable nature of the pseudo-labels assigned to the test data. Based on this observation, we propose pseudo-label-robust data adaptation (SODA) to improve the performance of data adaptation. Specifically, SODA leverages high-confidence predicted labels as reliable labels to optimize the data adaptor with ZOO for label prediction. For data with low-confidence predictions, SODA encourages the adaptor to preserve data information to mitigate data corruption. Empirical results indicate that SODA can significantly enhance the performance of deployed models in the presence of distribution shifts without requiring access to model parameters. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09475 [pdf]

Twisted DNA origami-based chiral monolayers for spin filtering

Authors: Haozhi Wang, Fangfei Yin, Linyun Li, Mingqiang Li, Zheng Fang, Chenyun Sun, Bochen Li, Jiye Shi, Jiang Li, Lihua Wang, Shi** Song, Xiaolei Zuo, Xiaoguo Liu, Chunhai Fan

Abstract: DNA monolayers with inherent chirality play a pivotal role across various domains, including biosensors, DNA chips, and bioelectronics. Nonetheless, conventional DNA chiral monolayers, typically constructed from single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA), often lack structural orderliness and design flexibility at the interface. Structural DNA nanotechnology emerges as a promising… ▽ More DNA monolayers with inherent chirality play a pivotal role across various domains, including biosensors, DNA chips, and bioelectronics. Nonetheless, conventional DNA chiral monolayers, typically constructed from single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA), often lack structural orderliness and design flexibility at the interface. Structural DNA nanotechnology emerges as a promising solution to tackle these challenges. In this study, we present a strategy for crafting highly adaptable twisted DNA origami-based chiral monolayers. These structures exhibit distinct interfacial assembly characteristics and effectively mitigate the structural disorder of dsDNA monolayers, which is constrained by a limited persistence length of ~50 nm of dsDNA. We highlight the spin-filtering capabilities of four representative DNA origami-based chiral monolayers, demonstrating a maximal one-order-of-magnitude increase in spin-filtering efficiency per unit area compared to conventional dsDNA chiral monolayers. Intriguingly, our findings reveal that the higher-order, tertiary, chiral structure of twisted DNA origami further enhances the spin-filtering efficiency. This work paves the way for the rational design of DNA chiral monolayers. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.07464 [pdf]

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Chang**g Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 47 pages, 6 figures

arXiv:2310.06403 [pdf, other]

Boundary Discretization and Reliable Classification Network for Temporal Action Detection

Authors: Zhenying Fang, Jun Yu, Richang Hong

Abstract: Temporal action detection aims to recognize the action category and determine each action instance's starting and ending time in untrimmed videos. The mixed methods have achieved remarkable performance by seamlessly merging anchor-based and anchor-free approaches. Nonetheless, there are still two crucial issues within the mixed framework: (1) Brute-force merging and handcrafted anchor design hinde… ▽ More Temporal action detection aims to recognize the action category and determine each action instance's starting and ending time in untrimmed videos. The mixed methods have achieved remarkable performance by seamlessly merging anchor-based and anchor-free approaches. Nonetheless, there are still two crucial issues within the mixed framework: (1) Brute-force merging and handcrafted anchor design hinder the substantial potential and practicality of the mixed methods. (2) Within-category predictions show a significant abundance of false positives. In this paper, we propose a novel Boundary Discretization and Reliable Classification Network (BDRC-Net) that addresses the issues above by introducing boundary discretization and reliable classification modules. Specifically, the boundary discretization module (BDM) elegantly merges anchor-based and anchor-free approaches in the form of boundary discretization, eliminating the need for the traditional handcrafted anchor design. Furthermore, the reliable classification module (RCM) predicts reliable global action categories to reduce false positives. Extensive experiments conducted on different benchmarks demonstrate that our proposed method achieves competitive detection performance. The code will be released at https://github.com/zhenyingfang/BDRC-Net. △ Less

Submitted 7 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 12 pages, Source code: https://github.com/zhenyingfang/BDRC-Net

arXiv:2310.00013 [pdf, other]

Adaptive Communications in Collaborative Perception with Domain Alignment for Autonomous Driving

Authors: Senkang Hu, Zhengru Fang, Haonan An, Guowen Xu, Yuan Zhou, Xianhao Chen, Yuguang Fang

Abstract: Collaborative perception among multiple connected and autonomous vehicles can greatly enhance perceptive capabilities by allowing vehicles to exchange supplementary information via communications. Despite advances in previous approaches, challenges still remain due to channel variations and data heterogeneity among collaborative vehicles. To address these issues, we propose ACC-DA, a channel-aware… ▽ More Collaborative perception among multiple connected and autonomous vehicles can greatly enhance perceptive capabilities by allowing vehicles to exchange supplementary information via communications. Despite advances in previous approaches, challenges still remain due to channel variations and data heterogeneity among collaborative vehicles. To address these issues, we propose ACC-DA, a channel-aware collaborative perception framework to dynamically adjust the communication graph and minimize the average transmission delay while mitigating the side effects from the data heterogeneity. Our novelties lie in three aspects. We first design a transmission delay minimization method, which can construct the communication graph and minimize the transmission delay according to different channel information state. We then propose an adaptive data reconstruction mechanism, which can dynamically adjust the rate-distortion trade-off to enhance perception efficiency. Moreover, it minimizes the temporal redundancy during data transmissions. Finally, we conceive a domain alignment scheme to align the data distribution from different vehicles, which can mitigate the domain gap between different vehicles and improve the performance of the target task. Comprehensive experiments demonstrate the effectiveness of our method in comparison to the existing state-of-the-art works. △ Less

Submitted 16 March, 2024; v1 submitted 14 September, 2023; originally announced October 2023.

Comments: 6 pages, 6 figures

arXiv:2309.16730 [pdf]

Explainable machine learning-based prediction model for diabetic nephropathy

Authors: **g-Mei Yin, Yang Li, Jun-Tang Xue, Guo-Wei Zong, Zhong-Ze Fang, Lang Zou

Abstract: The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASS… ▽ More The aim of this study is to analyze the effect of serum metabolites on diabetic nephropathy (DN) and predict the prevalence of DN through a machine learning approach. The dataset consists of 548 patients from April 2018 to April 2019 in Second Affiliated Hospital of Dalian Medical University (SAHDMU). We select the optimal 38 features through a Least absolute shrinkage and selection operator (LASSO) regression model and a 10-fold cross-validation. We compare four machine learning algorithms, including eXtreme Gradient Boosting (XGB), random forest, decision tree and logistic regression, by AUC-ROC curves, decision curves, calibration curves. We quantify feature importance and interaction effects in the optimal predictive model by Shapley Additive exPlanations (SHAP) method. The XGB model has the best performance to screen for DN with the highest AUC value of 0.966. The XGB model also gains more clinical net benefits than others and the fitting degree is better. In addition, there are significant interactions between serum metabolites and duration of diabetes. We develop a predictive model by XGB algorithm to screen for DN. C2, C5DC, Tyr, Ser, Met, C24, C4DC, and Cys have great contribution in the model, and can possibly be biomarkers for DN. △ Less

Submitted 24 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13660 [pdf]

Non-Uniform Sampling Reconstruction for Symmetrical NMR Spectroscopy by Exploiting Inherent Symmetry

Authors: En** Lin, Ze Fang, Yuqing Huang, Yu Yang, Zhong Chen

Abstract: Symmetrical NMR spectroscopy constitutes a vital branch of multidimensional NMR spectroscopy, providing a powerful tool for the structural elucidation of biological macromolecules. Non-Uniform Sampling (NUS) serves as an effective strategy for averting the prohibitive acquisition time of multidimensional NMR spectroscopy by only sampling a few points according to NUS sampling schedules and reconst… ▽ More Symmetrical NMR spectroscopy constitutes a vital branch of multidimensional NMR spectroscopy, providing a powerful tool for the structural elucidation of biological macromolecules. Non-Uniform Sampling (NUS) serves as an effective strategy for averting the prohibitive acquisition time of multidimensional NMR spectroscopy by only sampling a few points according to NUS sampling schedules and reconstructing missing points via algorithms. However, current sampling schedules are unable to maintain the accurate recovery of cross peaks that are weak but important. In this work, we propose a novel sampling schedule termed as SCPG (Symmetrical Copy Poisson Gap) and employ CS (Compressed Sensing) methods for reconstruction. We theoretically prove that the symmetrical constraint, apart from sparsity, is implicitly implemented when SCPG is combined with CS methods. The simulated and experimental data substantiate the advantage of SCPG over state-of-the-art 2D Woven PG in the NUS reconstruction of symmetrical NMR spectroscopy. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 30 pages, 6 figures

arXiv:2309.13035 [pdf, other]

PyPose v0.6: The Imperative Programming Interface for Robotics

Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, **gnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12559 [pdf, other]

Invariant Learning via Probability of Sufficient and Necessary Causes

Authors: Mengyue Yang, Zhen Fang, Yonggang Zhang, Yali Du, Furui Liu, Jean-Francois Ton, Jianhong Wang, Jun Wang

Abstract: Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{suffic… ▽ More Out-of-distribution (OOD) generalization is indispensable for learning models in the wild, where testing distribution typically unknown and different from the training. Recent methods derived from causality have shown great potential in achieving OOD generalization. However, existing methods mainly focus on the invariance property of causes, while largely overlooking the property of \textit{sufficiency} and \textit{necessity} conditions. Namely, a necessary but insufficient cause (feature) is invariant to distribution shift, yet it may not have required accuracy. By contrast, a sufficient yet unnecessary cause (feature) tends to fit specific data well but may have a risk of adapting to a new domain. To capture the information of sufficient and necessary causes, we employ a classical concept, the probability of sufficiency and necessary causes (PNS), which indicates the probability of whether one is the necessary and sufficient cause. To associate PNS with OOD generalization, we propose PNS risk and formulate an algorithm to learn representation with a high PNS value. We theoretically analyze and prove the generalizability of the PNS risk. Experiments on both synthetic and real-world benchmarks demonstrate the effectiveness of the proposed method. The details of the implementation can be found at the GitHub repository: https://github.com/ymy4323460/CaSN. △ Less

Submitted 10 May, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.11751 [pdf, other]

How Robust is Google's Bard to Adversarial Image Attacks?

Authors: Yinpeng Dong, Huanran Chen, Jiawei Chen, Zhengwei Fang, Xiao Yang, Yichi Zhang, Yu Tian, Hang Su, Jun Zhu

Abstract: Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Ba… ▽ More Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard. Update: GPT-4V is available at October 2023. We further evaluate its robustness under the same set of adversarial examples, achieving a 45% attack success rate. △ Less

Submitted 14 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Technical report

arXiv:2309.11705 [pdf, other]

Meta OOD Learning for Continuously Adaptive OOD Detection

Authors: Xinheng Wu, Jie Lu, Zhen Fang, Guangquan Zhang

Abstract: Out-of-distribution (OOD) detection is crucial to modern deep learning applications by identifying and alerting about the OOD samples that should not be tested or used for making predictions. Current OOD detection methods have made significant progress when in-distribution (ID) and OOD samples are drawn from static distributions. However, this can be unrealistic when applied to real-world systems… ▽ More Out-of-distribution (OOD) detection is crucial to modern deep learning applications by identifying and alerting about the OOD samples that should not be tested or used for making predictions. Current OOD detection methods have made significant progress when in-distribution (ID) and OOD samples are drawn from static distributions. However, this can be unrealistic when applied to real-world systems which often undergo continuous variations and shifts in ID and OOD distributions over time. Therefore, for an effective application in real-world systems, the development of OOD detection methods that can adapt to these dynamic and evolving distributions is essential. In this paper, we propose a novel and more realistic setting called continuously adaptive out-of-distribution (CAOOD) detection which targets on develo** an OOD detection model that enables dynamic and quick adaptation to a new arriving distribution, with insufficient ID samples during deployment time. To address CAOOD, we develop meta OOD learning (MOL) by designing a learning-to-adapt diagram such that a good initialized OOD detection model is learned during the training process. In the testing process, MOL ensures OOD detection performance over shifting distributions by quickly adapting to new distributions with a few adaptations. Extensive experiments on several OOD benchmarks endorse the effectiveness of our method in preserving both ID classification accuracy and OOD detection performance on continuously shifting distributions. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2309.09949 [pdf, other]

How to Generate Popular Post Headlines on Social Media?

Authors: Zhouxiang Fang, Min Yu, Zhendong Fu, Boning Zhang, Xuanwen Huang, Xiaoqi Tang, Yang Yang

Abstract: Posts, as important containers of user-generated-content pieces on social media, are of tremendous social influence and commercial value. As an integral components of a post, the headline has a decisive contribution to the post's popularity. However, current mainstream method for headline generation is still manually writing, which is unstable and requires extensive human effort. This drives us to… ▽ More Posts, as important containers of user-generated-content pieces on social media, are of tremendous social influence and commercial value. As an integral components of a post, the headline has a decisive contribution to the post's popularity. However, current mainstream method for headline generation is still manually writing, which is unstable and requires extensive human effort. This drives us to explore a novel research question: Can we automate the generation of popular headlines on social media? We collect more than 1 million posts of 42,447 celebrities from public data of Xiaohongshu, which is a well-known social media platform in China. We then conduct careful observations on the headlines of these posts. Observation results demonstrate that trends and personal styles are widespread in headlines on social medias and have significant contribution to posts's popularity. Motivated by these insights, we present MEBART, which combines Multiple preference-Extractors with Bidirectional and Auto-Regressive Transformers (BART), capturing trends and personal styles to generate popular headlines on social medias. We perform extensive experiments on real-world datasets and achieve state-of-the-art performance compared with several advanced baselines. In addition, ablation and case studies demonstrate that MEBART advances in capturing trends and personal styles. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.04270 [pdf, other]

A Reliable and Resilient Framework for Multi-UAV Mutual Localization

Authors: Zexin Fang, Bin Han, Hans D. Schotten

Abstract: This paper presents a robust and secure framework for achieving accurate and reliable mutual localization in multiple unmanned aerial vehicle (UAV) systems. Challenges of accurate localization and security threats are addressed and corresponding solutions are brought forth and accessed in our paper with numerical simulations. The proposed solution incorporates two key components: the Mobility Adap… ▽ More This paper presents a robust and secure framework for achieving accurate and reliable mutual localization in multiple unmanned aerial vehicle (UAV) systems. Challenges of accurate localization and security threats are addressed and corresponding solutions are brought forth and accessed in our paper with numerical simulations. The proposed solution incorporates two key components: the Mobility Adaptive Gradient Descent (MAGD) and Time-evolving Anomaly Detectio (TAD). The MAGD adapts the gradient descent algorithm to handle the configuration changes in the mutual localization system, ensuring accurate localization in dynamic scenarios. The TAD cooperates with reputation propagation (RP) scheme to detect and mitigate potential attacks by identifying UAVs with malicious data, enhancing the security and resilience of the mutual localization △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: Accepted by the 2023 IEEE 98th Vehicular Technology Conference (VTC2023-Fall), Hong Kong, 10-13 October 2023

arXiv:2309.03084 [pdf, other]

Pure Monte Carlo Counterfactual Regret Minimization

Authors: Ju Qi, Ting Feng, Falun Hei, Zhemei Fang, Yunfeng Luo

Abstract: Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR… ▽ More Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR. We call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR. △ Less

Submitted 13 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2308.16320 [pdf, other]

Information Disclosure under Competition in Sharing Systems

Authors: Ningning Ding, Zhixuan Fang, Jianwei Huang

Abstract: Sharing systems have facilitated the redistribution of underused resources by providing convenient online marketplaces for individual sellers and buyers. However, sellers in these systems may not fully disclose the information of their shared commodities, due to strategic behaviors or privacy concerns. Sellers' strategic information disclosure significantly affects buyers' user experiences and sys… ▽ More Sharing systems have facilitated the redistribution of underused resources by providing convenient online marketplaces for individual sellers and buyers. However, sellers in these systems may not fully disclose the information of their shared commodities, due to strategic behaviors or privacy concerns. Sellers' strategic information disclosure significantly affects buyers' user experiences and systems' reputation. This paper presents the first analytical study on information disclosure and pricing of competing sellers in sharing systems. In particular, we propose a two-stage game framework to capture sellers' strategic behaviors and buyers' decisions. Although the optimization problem is challenging due to sellers' non-convex and non-monotonic objectives, we completely characterize the complex market equilibria by decomposing it into several tractable subproblems. We demonstrate that full disclosure by all sellers or non-disclosure by all sellers will both lead to intense price competition. The former all-disclosure case is never an equilibrium even when all sellers have good commodity qualities and low privacy costs, while the latter non-disclosure case can be an equilibrium under which all sellers get zero profit. We also reveal several critical factors that affect sellers' information disclosure. Interestingly, sellers' sharing capacity limitation and buyers' estimation biases encourage information disclosure as they mitigate sellers' competition. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.12939 [pdf, other]

Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries

Authors: Zhiwei Fang, Sifan Wang, Paris Perdikaris

Abstract: Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary… ▽ More Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from $O(N^d)$ to $O(N^{d-1})$, where $d$ is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parametrized complex geometries and unbounded problems. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.12055 [pdf, other]

Majorana corner modes in unconventional monolayers of 1T-PtSe2 family

Authors: Haohao Sheng, Yue Xie, Quansheng Wu, Hongming Weng, Xi Dai, B. Andrei Bernevig, Zhong Fang, Zhijun Wang

Abstract: In this work, we propose that Majorana zero modes can be realized at the corners of a topologically trivial insulator with unconventionality. We demonstrate that 1T-PtSe$_2$ is a symmetry indicator-free (SI-free) unconventional insulator, originating from orbital hybridization between Pt $d$ and Se $p_{x,y}$ states. The new kind of SI-free unconventionality has no symmetry eigenvalue indication. I… ▽ More In this work, we propose that Majorana zero modes can be realized at the corners of a topologically trivial insulator with unconventionality. We demonstrate that 1T-PtSe$_2$ is a symmetry indicator-free (SI-free) unconventional insulator, originating from orbital hybridization between Pt $d$ and Se $p_{x,y}$ states. The new kind of SI-free unconventionality has no symmetry eigenvalue indication. Instead, it is diagnosed directly by the Wannier charge centers by using the one-dimensional Wilson loop method. The obstructed edge states exhibit strong anisotropy and large Rashba splitting. By introducing superconducting proximity and external magnetic field, the Majorana corner modes can be obtained in 1T-PtSe$_2$ monolayer. In the end, we construct a two-Bernevig-Hughes-Zhang model with anisotropy to capture the Majorana physics. △ Less

Submitted 14 December, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

arXiv:2308.11875 [pdf, other]

Motion-to-Matching: A Mixed Paradigm for 3D Single Object Tracking

Authors: Zhiheng Li, Yu Lin, Yubo Cui, Shuo Li, Zheng Fang

Abstract: 3D single object tracking with LiDAR points is an important task in the computer vision field. Previous methods usually adopt the matching-based or motion-centric paradigms to estimate the current target status. However, the former is sensitive to the similar distractors and the sparseness of point cloud due to relying on appearance matching, while the latter usually focuses on short-term motion c… ▽ More 3D single object tracking with LiDAR points is an important task in the computer vision field. Previous methods usually adopt the matching-based or motion-centric paradigms to estimate the current target status. However, the former is sensitive to the similar distractors and the sparseness of point cloud due to relying on appearance matching, while the latter usually focuses on short-term motion clues (eg. two frames) and ignores the long-term motion pattern of target. To address these issues, we propose a mixed paradigm with two stages, named MTM-Tracker, which combines motion modeling with feature matching into a single network. Specifically, in the first stage, we exploit the continuous historical boxes as motion prior and propose an encoder-decoder structure to locate target coarsely. Then, in the second stage, we introduce a feature interaction module to extract motion-aware features from consecutive point clouds and match them to refine target movement as well as regress other target states. Extensive experiments validate that our paradigm achieves competitive performance on large-scale datasets (70.9% in KITTI and 51.70% in NuScenes). The code will be open soon at https://github.com/LeoZhiheng/MTM-Tracker.git. △ Less

Submitted 18 December, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted for publication at IEEE Robotics and Automation Letters (RAL)

arXiv:2308.09985 [pdf, other]

doi 10.1109/TNNLS.2024.3384987

HICL: Hashtag-Driven In-Context Learning for Social Media Natural Language Understanding

Authors: Hanzhuo Tan, Chunpu Xu, **g Li, Yuqun Zhang, Zeyang Fang, Zeyu Chen, Baohua Lai

Abstract: Natural language understanding (NLU) is integral to various social media applications. However, existing NLU models rely heavily on context for semantic learning, resulting in compromised performance when faced with short and noisy social media content. To address this issue, we leverage in-context learning (ICL), wherein language models learn to make inferences by conditioning on a handful of dem… ▽ More Natural language understanding (NLU) is integral to various social media applications. However, existing NLU models rely heavily on context for semantic learning, resulting in compromised performance when faced with short and noisy social media content. To address this issue, we leverage in-context learning (ICL), wherein language models learn to make inferences by conditioning on a handful of demonstrations to enrich the context and propose a novel hashtag-driven in-context learning (HICL) framework. Concretely, we pre-train a model #Encoder, which employs #hashtags (user-annotated topic labels) to drive BERT-based pre-training through contrastive learning. Our objective here is to enable #Encoder to gain the ability to incorporate topic-related semantic information, which allows it to retrieve topic-related posts to enrich contexts and enhance social media NLU with noisy contexts. To further integrate the retrieved context with the source text, we employ a gradient-based method to identify trigger terms useful in fusing information from both sources. For empirical studies, we collected 45M tweets to set up an in-context NLU benchmark, and the experimental results on seven downstream tasks show that HICL substantially advances the previous state-of-the-art results. Furthermore, we conducted extensive analyzes and found that: (1) combining source input with a top-retrieved post from #Encoder is more effective than using semantically similar posts; (2) trigger words can largely benefit in merging context from the source and retrieved posts. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: https://github.com/albertan017/HICL

Journal ref: 10.1109/TNNLS.2024.3384987

arXiv:2308.08740 [pdf]

On-chip coherent beam combination of waveguide amplifiers on Er$^{3+}$-doped thin film lithium niobate

Authors: Rui Bao, Lvbin Song, Zhiwei Fang, **min Chen, Zhe Wang, Jian Liu, Lang Gao, Zhaoxiang Liu, Zhihao Zhang, Min Wang, Haisu Zhang, Ya Cheng

Abstract: We demonstrate on-chip coherent beam combination of two waveguide amplifiers on Er$^{3+}$-doped thin film lithium niobate (Er: TFLN) platform. Our device is built based on an electro-optic modulator fabricated on Er: TFLN. The output power of the coherently combined amplifiers is measured as high as 12.9 mW, surpassing that of previous single waveguide amplifiers based on Er$^{3+}$-doped thin film… ▽ More We demonstrate on-chip coherent beam combination of two waveguide amplifiers on Er$^{3+}$-doped thin film lithium niobate (Er: TFLN) platform. Our device is built based on an electro-optic modulator fabricated on Er: TFLN. The output power of the coherently combined amplifiers is measured as high as 12.9 mW, surpassing that of previous single waveguide amplifiers based on Er$^{3+}$-doped thin film lithium niobate platform. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.03666 [pdf, other]

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

Authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shi** Wang, Wenzhong Guo

Abstract: As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence… ▽ More As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed. △ Less

Submitted 18 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.01098 [pdf, other]

Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search

Authors: Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-** Peng, Zhan-Gang Lin, **g-He Hu, **g-** Shao

Abstract: Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency querie… ▽ More Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach. △ Less

Submitted 19 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.16562 [pdf, other]

SAKSHI: Decentralized AI Platforms

Authors: Suma Bhat, Canhui Chen, Zerui Cheng, Zhixuan Fang, Ashwin Hebbar, Sreeram Kannan, Ranvir Rana, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Xuechao Wang

Abstract: Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platfor… ▽ More Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabilities. These services are offered largely in a traditional web2.0 format (e.g., OpenAI's GPT4 service). As more large AI models proliferate (personalizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platform that allows the hosting of AI models, clients receiving AI services efficiently, yet in a trust-free, incentive compatible, Byzantine behavior resistant manner. In this paper we propose SAKSHI, a trust-free decentralized platform specifically suited for AI services. The key design principles of SAKSHI are the separation of the data path (where AI query and service is managed) and the control path (where routers and compute and storage hosts are managed) from the transaction path (where the metering and billing of services are managed over a blockchain). This separation is enabled by a "proof of inference" layer which provides cryptographic resistance against a variety of misbehaviors, including poor AI service, nonpayment for service, copying of AI models. This is joint work between multiple universities (Princeton University, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST) and two startup companies (Witness Chain and Eigen Layer). △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: 23 pages, 9 figures

arXiv:2307.12103 [pdf]

Non-volatile Phase-only Transmissive Spatial Light Modulators

Authors: Zhuoran Fang, Rui Chen, Johannes E. Fröch, Quentin A. A. Tanguy, Asir Intisar Khan, Xiang** Wu, Virat Tara, Arnab Manna, David Sharp, Christopher Munley, Forrest Miller, Yang Zhao, Sarah J. Geiger, Karl F. Böhringer, Matthew Reynolds, Eric Pop, Arka Majumdar

Abstract: Free-space modulation of light is crucial for many applications, from light detection and ranging to virtual or augmented reality. Traditional means of modulating free-space light involves spatial light modulators based on liquid crystals and microelectromechanical systems, which are bulky, have large pixel areas (~10 micron x 10 micron), and require high driving voltage. Recent progress in meta-o… ▽ More Free-space modulation of light is crucial for many applications, from light detection and ranging to virtual or augmented reality. Traditional means of modulating free-space light involves spatial light modulators based on liquid crystals and microelectromechanical systems, which are bulky, have large pixel areas (~10 micron x 10 micron), and require high driving voltage. Recent progress in meta-optics has shown promise to circumvent some of the limitations. By integrating active materials with sub-wavelength pixels in a meta-optic, the power consumption can be dramatically reduced while achieving a faster speed. However, these reconfiguration methods are volatile and hence require constant application of control signals, leading to phase jitter and crosstalk. Additionally, to control a large number of pixels, it is essential to implement a memory within each pixel to have a tractable number of control signals. Here, we develop a device with nonvolatile, electrically programmable, phase-only modulation of free-space infrared radiation in transmission using the low-loss phase-change material (PCM) Sb2Se3. By coupling an ultra-thin PCM layer to a high quality (Q)-factor (Q~406) diatomic metasurface, we demonstrate a phase-only modulation of ~0.25pi (~0.2pi) in simulation (experiment), ten times larger than a bare PCM layer of the same thickness. The device shows excellent endurance over 1,000 switching cycles. We then advance the device geometry, to enable independent control of 17 meta-molecules, achieving ten deterministic resonance levels with a 2pi phase shift. By independently controlling the phase delay of pixels, we further show tunable far-field beam sha**. Our work paves the way to realizing non-volatile transmissive phase-only spatial light modulators. △ Less

Submitted 22 July, 2023; originally announced July 2023.

arXiv:2307.11530 [pdf, other]

UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN

Authors: Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang

Abstract: Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influence… ▽ More Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influences. To avoid negative impacts, cross-modality medical image generation algorithms have been proposed. Nevertheless, current methods in fundus imaging could not produce high-resolution images and are unable to capture tiny vascular lesion areas. This paper proposes a novel conditional generative adversarial network (UWAT-GAN) to synthesize UWF-FA from UWF-SLO. Using multi-scale generators and a fusion module patch to better extract global and local information, our model can generate high-resolution images. Moreover, an attention transmit module is proposed to help the decoder learn effectively. Besides, a supervised approach is used to train the network using multiple new weighted losses on different scales of data. Experiments on an in-house UWF image dataset demonstrate the superiority of the UWAT-GAN over the state-of-the-art methods. The source code is available at: https://github.com/Tinysqua/UWAT-GAN. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: 26th International Conference on Medical Image Computing and Computer Assisted Intervention

arXiv:2307.10371 [pdf, other]

Enumeration of spin-space groups: Towards a complete description of symmetries of magnetic orders

Authors: Yi Jiang, Ziyin Song, Tiannian Zhu, Zhong Fang, Hongming Weng, Zheng-Xin Liu, Jian Yang, Chen Fang

Abstract: Symmetries of three-dimensional periodic scalar fields are described by 230 space groups (SGs). Symmetries of three-dimensional periodic (pseudo-) vector fields, however, are described by the spin-space groups (SSGs), which were initially used to describe the symmetries of magnetic orders. In SSGs, the real-space and spin degrees of freedom are unlocked in the sense that an operation could have di… ▽ More Symmetries of three-dimensional periodic scalar fields are described by 230 space groups (SGs). Symmetries of three-dimensional periodic (pseudo-) vector fields, however, are described by the spin-space groups (SSGs), which were initially used to describe the symmetries of magnetic orders. In SSGs, the real-space and spin degrees of freedom are unlocked in the sense that an operation could have different spacial and spin rotations. SSGs gives a complete symmetry description of magnetic structures, and have natural applications in the band theory of itinerary electrons in magnetically ordered systems with weak spin-orbit coupling.\textit{Altermagnetism}, a concept raised recently that belongs to the symmetry-compensated collinear magnetic orders but has non-relativistic spin splitting, is well described by SSGs. Due to the vast number and complicated group structures, SSGs have not yet been systematically enumerated. In this work, we exhaust SSGs based on the invariant subgroups of SGs, with spin operations constructed from three-dimensional (3D) real representations of the quotient groups for the invariant subgroups. For collinear and coplanar magnetic orders, the spin operations can be reduced into lower dimensional real representations. As the number of SSGs is infinite, we only consider SSGs that describe magnetic unit cells up to 12 times crystal unit cells. We obtain 157,289 non-coplanar, 24,788 coplanar-non-collinear, and 1,421 collinear SSGs. The enumerated SSGs are stored in an online database at \url{https://cmpdc.iphy.ac.cn/ssg} with a user-friendly interface. We also develop an algorithm to identify SSG for realistic materials and find SSGs for 1,626 magnetic materials. Our results serve as a solid starting point for further studies of symmetry and topology in magnetically ordered materials. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Showing 101–150 of 758 results for author: Fang, Z