-
Co-benefits of Agricultural Diversification and Technology for Food and Nutrition Security in China
Authors:
Thomas Cherico Wanger,
Estelle Raveloaritiana,
Siyan Zeng,
Haixiu Gao,
Xueqing He,
Yiwen Shao,
Panlong Wu,
Kris A. G. Wyckhuys,
Wenwu Zhou,
Yi Zou,
Zengrong Zhu,
Ling Li,
Haiyan Cen,
Yunhui Liu,
Shenggen Fan
Abstract:
China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into…
▽ More
China is the leading crop producer and has successfully implemented sustainable development programs related to agriculture. Sustainable agriculture has been promoted to achieve national food security targets such as food self-sufficiency through the well-facilitated farmland construction (WFFC) approach. The WFFC is introduced in Chinas current national 10-year plan to consolidate farmlands into large and simplified production areas to maximise automation, and improve soil fertility and productivity. However, research suggests that diversified and smaller farms faciliate ecosystem services, can improve yield resilience, defuse human health threats, and increase farm profitability. Currently, WFFC has not considered ecological farmland improvements and it may miss long-term environmental benefits including ecosystem service preservation conducive to yields. Moreover, the nutritional status in China has changed in recent decades with undernutrition being dramatically reduced, but the prevalence of overweight, obesity, and chronic diseases being increased. While a strategic choice and management of crop and livestock species can improve nutrition, the environmental and production benefits of agricultural diversification are currently not well interlinked with Chinas food and nutrition security discussions. Lastly, the role of agricultural technology for socioeconomic benefits and the link with diversified agricultural production may provide vast benefits for food security. Here, we focus on the opportunities and co-benefits of agricultural diversification and technology innovations to advance food and nutrition security in China through ecosystem service and yield benefits. Our applied five-point research agenda can provide evidence-based opportunities to support China in reaching its ambitious food security targets through agricultural diversification with global ramifications.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Topological winding guaranteed coherent orthogonal scattering
Authors:
Cheng Guo,
Shanhui Fan
Abstract:
Coherent control has enabled various novel phenomena in wave scattering. We introduce an effect called coherent orthogonal scattering, where the output wave becomes orthogonal to the reference output state without scatterers. This effect leads to a unity extinction coefficient and complete mode conversion. We examine the conditions for this effect and reveal its topological nature by relating it t…
▽ More
Coherent control has enabled various novel phenomena in wave scattering. We introduce an effect called coherent orthogonal scattering, where the output wave becomes orthogonal to the reference output state without scatterers. This effect leads to a unity extinction coefficient and complete mode conversion. We examine the conditions for this effect and reveal its topological nature by relating it to the indivisibility between the dimension and the winding number of scattering submatrices. These findings deepen our understanding of topological scattering phenomena.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Authors:
Yifan Yang,
Zheshu Song,
Jianheng Zhuo,
Mingyu Cui,
**peng Li,
Bo Yang,
Yexing Du,
Ziyang Ma,
Xunying Liu,
Ziyuan Wang,
Ke Li,
Shuai Fan,
Kai Yu,
Wei-Qiang Zhang,
Guoguo Chen,
Xie Chen
Abstract:
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee…
▽ More
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired speech and text data. GigaSpeech 2 comprises about 30,000 hours of automatically transcribed speech, including Thai, Indonesian, and Vietnamese, gathered from unlabeled YouTube videos. We also introduce an automated pipeline for data crawling, transcription, and label refinement. Specifically, this pipeline uses Whisper for initial transcription and TorchAudio for forced alignment, combined with multi-dimensional filtering for data quality assurance. A modified Noisy Student Training is developed to further refine flawed pseudo labels iteratively, thus enhancing model performance. Experimental results on our manually transcribed evaluation set and two public test sets from Common Voice and FLEURS confirm our corpus's high quality and broad applicability. Notably, ASR models trained on GigaSpeech 2 can reduce the word error rate for Thai, Indonesian, and Vietnamese on our challenging and realistic YouTube test set by 25% to 40% compared to the Whisper large-v3 model, with merely 10% model parameters. Furthermore, our ASR models trained on Gigaspeech 2 yield superior performance compared to commercial services. We believe that our newly introduced corpus and pipeline will open a new avenue for low-resource speech recognition and significantly facilitate research in this area.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The shifted prime-divisor function over shifted primes
Authors:
Steve Fan
Abstract:
Let $a,b\in\mathbb{Z}\setminus\{0\}$. For every $n\in\mathbb{N}$, we denote by $ω_a^*(n)$ the number of shifted-prime divisors $p-a$ of $n$, where $p>a$ is prime. In this paper, we study the moments of $ω_a^*$ over shifted primes $p-b$. Specifically, we prove an asymptotic formula for the first moment and upper and lower bounds of the correct order of magnitude for the second moment. These results…
▽ More
Let $a,b\in\mathbb{Z}\setminus\{0\}$. For every $n\in\mathbb{N}$, we denote by $ω_a^*(n)$ the number of shifted-prime divisors $p-a$ of $n$, where $p>a$ is prime. In this paper, we study the moments of $ω_a^*$ over shifted primes $p-b$. Specifically, we prove an asymptotic formula for the first moment and upper and lower bounds of the correct order of magnitude for the second moment. These results suggest that the average behavior of $ω^*_a$ on shifted primes is similar to its average behavior on natural numbers. We shall also prove upper bounds for the mean values of sub-multiplicative functions in a nice class over the least common multiples of the shifted primes $p-a$ and $q-b$. Such upper bounds are intimately related to the second moments of $ω^*_a$ over natural numbers and over shifted primes. Finally, we propose a new conjecture on the second moment of $ω_1^*$ over natural numbers and provide a heuristic argument in support of this conjecture.
△ Less
Submitted 14 June, 2024; v1 submitted 7 June, 2024;
originally announced June 2024.
-
Semantic Similarity Score for Measuring Visual Similarity at Semantic Level
Authors:
Senran Fan,
Zhicheng Bao,
Chen Dong,
Haotai Liang,
Xiaodong Xu,
** Zhang
Abstract:
Semantic communication, as a revolutionary communication architecture, is considered a promising novel communication paradigm. Unlike traditional symbol-based error-free communication systems, semantic-based visual communication systems extract, compress, transmit, and reconstruct images at the semantic level. However, widely used image similarity evaluation metrics, whether pixel-based MSE or PSN…
▽ More
Semantic communication, as a revolutionary communication architecture, is considered a promising novel communication paradigm. Unlike traditional symbol-based error-free communication systems, semantic-based visual communication systems extract, compress, transmit, and reconstruct images at the semantic level. However, widely used image similarity evaluation metrics, whether pixel-based MSE or PSNR or structure-based MS-SSIM, struggle to accurately measure the loss of semantic-level information of the source during system transmission. This presents challenges in evaluating the performance of visual semantic communication systems, especially when comparing them with traditional communication systems. To address this, we propose a semantic evaluation metric -- SeSS (Semantic Similarity Score), based on Scene Graph Generation and graph matching, which shifts the similarity scores between images into semantic-level graph matching scores. Meanwhile, semantic similarity scores for tens of thousands of image pairs are manually annotated to fine-tune the hyperparameters in the graph matching algorithm, aligning the metric more closely with human semantic perception. The performance of the SeSS is tested on different datasets, including (1)images transmitted by traditional and semantic communication systems at different compression rates, (2)images transmitted by traditional and semantic communication systems at different signal-to-noise ratios, (3)images generated by large-scale model with different noise levels introduced, and (4)cases of images subjected to certain special transformations. The experiments demonstrate the effectiveness of SeSS, indicating that the metric can measure the semantic-level differences in semantic-level information of images and can be used for evaluation in visual semantic communication systems.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms
Authors:
Xingrun Xing,
Zheng Zhang,
Ziyi Ni,
Shitao Xiao,
Yiming Ju,
Siqi Fan,
Yequan Wang,
Jiajun Zhang,
Guoqi Li
Abstract:
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in…
▽ More
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with {0,1} levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https://github.com/Xingrun-Xing/SpikeLM.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data
Authors:
Shunxing Fan,
Mingming Gong,
Kun Zhang
Abstract:
We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such insta…
▽ More
We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.
△ Less
Submitted 11 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Authors:
Shixuan Fan,
Wei Wei,
Wendi Li,
Xian-Ling Mao,
Wenfeng Xie,
Dangyang Chen
Abstract:
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to…
▽ More
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Personalized Topic Selection Model for Topic-Grounded Dialogue
Authors:
Shixuan Fan,
Wei Wei,
Xiaofei Wen,
Xianling Mao,
Jixiong Chen,
Dangyang Chen
Abstract:
Recently, the topic-grounded dialogue (TGD) system has become increasingly popular as its powerful capability to actively guide users to accomplish specific tasks through topic-guided conversations. Most existing works utilize side information (\eg topics or personas) in isolation to enhance the topic selection ability. However, due to disregarding the noise within these auxiliary information sour…
▽ More
Recently, the topic-grounded dialogue (TGD) system has become increasingly popular as its powerful capability to actively guide users to accomplish specific tasks through topic-guided conversations. Most existing works utilize side information (\eg topics or personas) in isolation to enhance the topic selection ability. However, due to disregarding the noise within these auxiliary information sources and their mutual influence, current models tend to predict user-uninteresting and contextually irrelevant topics. To build user-engaging and coherent dialogue agent, we propose a \textbf{P}ersonalized topic s\textbf{E}lection model for \textbf{T}opic-grounded \textbf{D}ialogue, named \textbf{PETD}, which takes account of the interaction of side information to selectively aggregate such information for more accurately predicting subsequent topics. Specifically, we evaluate the correlation between global topics and personas and selectively incorporate the global topics aligned with user personas. Furthermore, we propose a contrastive learning based persona selector to filter out irrelevant personas under the constraint of lacking pertinent persona annotations. Throughout the selection and generation, diverse relevant side information is considered. Extensive experiments demonstrate that our proposed method can generate engaging and diverse responses, outperforming state-of-the-art baselines across various evaluation metrics.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Sparsity-Accelerated Training for Large Language Models
Authors:
Da Ma,
Lu Chen,
Pengyu Wang,
Hongshen Xu,
Hanqi Li,
Liangtai Sun,
Su Zhu,
Shuai Fan,
Kai Yu
Abstract:
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this trai…
▽ More
Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Optimizing Age of Information in Random Access Networks: A Second-Order Approach for Active/Passive Users
Authors:
Siqi Fan,
Yuxin Zhong,
I-Hong Hou,
Clement K Kam
Abstract:
In this paper, we study the moments of the Age of Information (AoI) for both active and passive users in a random access network. In this network, active users broadcast sensing data, while passive users detect in-band radio activities from out-of-network devices, such as jammers. Collisions occur when multiple active users transmit simultaneously. Passive users can detect radio activities only wh…
▽ More
In this paper, we study the moments of the Age of Information (AoI) for both active and passive users in a random access network. In this network, active users broadcast sensing data, while passive users detect in-band radio activities from out-of-network devices, such as jammers. Collisions occur when multiple active users transmit simultaneously. Passive users can detect radio activities only when no active user transmits. Each active user's transmission behavior follows a Markov process. We aim to minimize the weighted sum of any moments of AoI for both user types. To achieve this, we employ a second-order analysis of system behavior. Specifically, we characterize an active user's transmission Markov process using its mean and temporal variance. We show that any moment of the AoI can be approximated by a function of these two parameters. This insight enables us to analyze and optimize the transmission Markov process for active users. We apply this strategy to two different random access models. Simulation results show that policies derived from this strategy outperform other baseline policies.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Non-Abelian lattice gauge fields in the photonic synthetic frequency dimension
Authors:
Dali Cheng,
Kai Wang,
Charles Roques-Carmes,
Eran Lustig,
Olivia Y. Long,
Heming Wang,
Shanhui Fan
Abstract:
Non-Abelian gauge fields provide a conceptual framework for the description of particles having spins. The theoretical importance of non-Abelian gauge fields motivates their experimental synthesis and explorations. Here, we demonstrate non-Abelian lattice gauge fields for photons. In the study of gauge fields, lattice models are essential for the understanding of their implications in extended sys…
▽ More
Non-Abelian gauge fields provide a conceptual framework for the description of particles having spins. The theoretical importance of non-Abelian gauge fields motivates their experimental synthesis and explorations. Here, we demonstrate non-Abelian lattice gauge fields for photons. In the study of gauge fields, lattice models are essential for the understanding of their implications in extended systems. We utilize the platform of synthetic frequency dimensions, which enables the study of lattice physics in a scalable and programmable way. We observe Dirac cones at time-reversal-invariant momenta as well as the direction reversal of eigenstate trajectories associated with such Dirac cones. Both of them are unique signatures of non-Abelian gauge fields in our lattice system. Our results highlight the implications of non-Abelian gauge field in the study of topological physics and suggest opportunities for the control of photon spins and pseudospins.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Decoherence-free many-body Hamiltonians in nonlinear waveguide quantum electrodynamics
Authors:
Aviv Karnieli,
Offek Tziperman,
Charles Roques-Carmes,
Shanhui Fan
Abstract:
Enhancing interactions in many-body quantum systems, while protecting them from environmental decoherence, is at the heart of many quantum technologies. Waveguide quantum electrodynamics is a promising platform for achieving this, as it hosts infinite-range interactions and decoherence-free subspaces of quantum emitters. However, as coherent interactions between emitters are typically washed out i…
▽ More
Enhancing interactions in many-body quantum systems, while protecting them from environmental decoherence, is at the heart of many quantum technologies. Waveguide quantum electrodynamics is a promising platform for achieving this, as it hosts infinite-range interactions and decoherence-free subspaces of quantum emitters. However, as coherent interactions between emitters are typically washed out in the wavelength-spacing regime hosting decoherence-free states, coherent control over the latter becomes limited, and many-body Hamiltonians in this important regime remain out of reach. Here we show that by incorporating emitter arrays with nonlinear waveguides hosting parametric gain, we obtain a unique class of many-body interaction Hamiltonians with coupling strengths that increase with emitter spacing, and persist even for wavelength-spaced arrays. We then propose to use these Hamiltonians to coherently generate decoherence-free states directly from the ground state, using only global squeezing drives, without the need for local addressing of individual emitters. Interestingly, we find that the dynamics approaches a unitary evolution in the limit of weak intra-waveguide squeezing, and discuss potential experimental realizations of this effect. Our results pave the way towards coherent control protocols in waveguide quantum electrodynamics, with applications including quantum computing, simulation, memory and nonclassical light generation.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Towards Unified Multi-granularity Text Detection with Interactive Attention
Authors:
Xingyu Wan,
Chengquan Zhang,
Pengyuan Lyu,
Sen Fan,
Zihan Ni,
Kun Yao,
Errui Ding,
**gdong Wang
Abstract:
Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a…
▽ More
Existing OCR engines or document image analysis systems typically rely on training separate models for text detection in varying scenarios and granularities, leading to significant computational complexity and resource demands. In this paper, we introduce "Detect Any Text" (DAT), an advanced paradigm that seamlessly unifies scene text detection, layout analysis, and document page detection into a cohesive, end-to-end model. This design enables DAT to efficiently manage text instances at different granularities, including *word*, *line*, *paragraph* and *page*. A pivotal innovation in DAT is the across-granularity interactive attention module, which significantly enhances the representation learning of text instances at varying granularities by correlating structural information across different text queries. As a result, it enables the model to achieve mutually beneficial detection performances across multiple text granularities. Additionally, a prompt-based segmentation module refines detection outcomes for texts of arbitrary curvature and complex layouts, thereby improving DAT's accuracy and expanding its real-world applicability. Experimental results demonstrate that DAT achieves state-of-the-art performances across a variety of text-related benchmarks, including multi-oriented/arbitrarily-shaped scene text detection, document layout analysis and page detection tasks.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
A novel fault localization with data refinement for hydroelectric units
Authors:
Jialong Huang,
Junlin Song,
Penglong Lian,
Mengjie Gan,
Zhiheng Su,
Benhao Wang,
Wenji Zhu,
Xiaomin Pu,
Jianxiao Zou,
Shicai Fan
Abstract:
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni…
▽ More
Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learning (SG-WMBDL) based fault localization method for hydroelectric units is proposed. To overcome the data scarcity, a SAE is embedded into the GAN to generate more high-quality samples in the data generation module. Considering the signals involving non-linear and non-smooth characteristics, the improved WNR which combining both soft and hard thresholding and local linear embedding (LLE) are utilized to the data preprocessing module in order to reduce the noise and effectively capture the local features. In addition, to seek higher performance, the novel Adaptive Boost (AdaBoost) combined with multi deep learning is proposed to achieve accurate fault localization. The experimental results show that the SG-WMBDL can locate faults for hydroelectric units under a small number of fault samples with non-linear and non-smooth characteristics on higher precision and accuracy compared to other frontier methods, which verifies the effectiveness and practicality of the proposed method.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Few-shot fault diagnosis based on multi-scale graph convolution filtering for industry
Authors:
Mengjie Gan,
Penglong Lian,
Zhiheng Su,
Jiyang Zhang,
Jialong Huang,
Benhao Wang,
Jianxiao Zou,
Shicai Fan
Abstract:
Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure mode…
▽ More
Industrial equipment fault diagnosis often encounter challenges such as the scarcity of fault data, complex operating conditions, and varied types of failures. Signal analysis, data statistical learning, and conventional deep learning techniques face constraints under these conditions due to their substantial data requirements and the necessity for transfer learning to accommodate new failure modes. To effectively leverage information and extract the intrinsic characteristics of faults across different domains under limited sample conditions, this paper introduces a fault diagnosis approach employing Multi-Scale Graph Convolution Filtering (MSGCF). MSGCF enhances the traditional Graph Neural Network (GNN) framework by integrating both local and global information fusion modules within the graph convolution filter block. This advancement effectively mitigates the over-smoothing issue associated with excessive layering of graph convolutional layers while preserving a broad receptive field. It also reduces the risk of overfitting in few-shot diagnosis, thereby augmenting the model's representational capacity. Experiments on the University of Paderborn bearing dataset (PU) demonstrate that the MSGCF method proposed herein surpasses alternative approaches in accuracy, thereby offering valuable insights for industrial fault diagnosis in few-shot learning scenarios.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Deep Grokking: Would Deep Neural Networks Generalize Better?
Authors:
Simin Fan,
Razvan Pascanu,
Martin Jaggi
Abstract:
Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on s…
▽ More
Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on shallow networks such as 2-layer MLP and 1-layer Transformer, we explore grokking on deep networks (e.g. 12-layer MLP). We empirically replicate the phenomenon and find that deep neural networks can be more susceptible to grokking than its shallower counterparts. Meanwhile, we observe an intriguing multi-stage generalization phenomenon when increase the depth of the MLP model where the test accuracy exhibits a secondary surge, which is scarcely seen on shallow models. We further uncover compelling correspondences between the decreasing of feature ranks and the phase transition from overfitting to the generalization stage during grokking. Additionally, we find that the multi-stage generalization phenomenon often aligns with a double-descent pattern in feature ranks. These observations suggest that internal feature rank could serve as a more promising indicator of the model's generalization behavior compared to the weight-norm. We believe our work is the first one to dive into grokking in deep neural networks, and investigate the relationship of feature rank and generalization performance.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Highly Tunable Ru-dimer Molecular Orbital State in 6H-perovskite Ba$_3$MRu$_2$O$_9$
Authors:
Bo Yuan,
Beom Hyun Kim,
Qiang Chen,
Daniel Dobrowolski,
Monika Azmanska,
G. M. Luke,
Shiyu Fan,
Valentina Bisogni,
Jonathan Pelliciari,
J. P. Clancy
Abstract:
Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We…
▽ More
Molecular orbital (MO) systems with clusters of heavy transition metal (TM) ions are one of the most important classes of model materials for studying the interplay between local physics and effects of itinerancy. Despite a large number of candidates identified in the family of 4d TM materials, an understanding of their physics from competing \textit{microscopic} energy scales is still missing. We bridge this gap by reporting the first resonant inelastic X-ray scattering (RIXS) measurement on a well-known series of Ru dimer systems with a 6H-perovskite structure, Ba$_3$MRu$_2$O$_9$ (M$^{3+}$=In$^{3+}$, Y$^{3+}$, La$^{3+}$). Our RIXS measurements reveal an extremely fragile MO state in these Ru dimer compounds, evidenced by an abrupt change in the RIXS spectrum accompanying a tiny change in the local structure tuned by the M-site ion. By modelling the RIXS spectra, we attribute the enhanced electronic instability in Ba$_3$MRu$_2$O$_9$ to the combined effect of a large hop** and a small spin-orbit coupling in the Ru dimers. The unique combination of energy scales uncovered in the present study make Ru MO systems ideal model systems for studying quantum phase transitions with molecular orbitals.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Learning Social Graph for Inactive User Recommendation
Authors:
Nian Liu,
Shen Fan,
Ting Bai,
Peng Wang,
Mingwei Sun,
Yanhu Mo,
Xiaoxiao Xu,
Hong Liu,
Chuan Shi
Abstract:
Social relations have been widely incorporated into recommender systems to alleviate data sparsity problem. However, raw social relations don't always benefit recommendation due to their inferior quality and insufficient quantity, especially for inactive users, whose interacted items are limited. In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{…
▽ More
Social relations have been widely incorporated into recommender systems to alleviate data sparsity problem. However, raw social relations don't always benefit recommendation due to their inferior quality and insufficient quantity, especially for inactive users, whose interacted items are limited. In this paper, we propose a novel social recommendation method called LSIR (\textbf{L}earning \textbf{S}ocial Graph for \textbf{I}nactive User \textbf{R}ecommendation) that learns an optimal social graph structure for social recommendation, especially for inactive users. LSIR recursively aggregates user and item embeddings to collaboratively encode item and user features. Then, graph structure learning (GSL) is employed to refine the raw user-user social graph, by removing noisy edges and adding new edges based on the enhanced embeddings. Meanwhile, mimic learning is implemented to guide active users in mimicking inactive users during model training, which improves the construction of new edges for inactive users. Extensive experiments on real-world datasets demonstrate that LSIR achieves significant improvements of up to 129.58\% on NDCG in inactive user recommendation. Our code is available at~\url{https://github.com/liun-online/LSIR}.
△ Less
Submitted 22 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Circularly polarized light irradiated ferromagnetic MnBi$_2$Te$_4$: the long-sought ideal Weyl semimetal
Authors:
Shuai Fan,
Shengpu Huang,
Zhuo Chen,
Fangyang Zhan,
Xian-Yong Ding,
Da-Shuai Ma,
Rui Wang
Abstract:
The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that…
▽ More
The interaction between light and non-trivial energy band topology allows for the precise manipulation of topological quantum states, which has attracted intensive interest in condensed matter physics. In this work, using first-principles calculations, we studied the topological transition of ferromagnetic (FM) MnBi$_2$Te$_4$ upon irradiation with circularly polarized light (CPL). We revealed that the MnBi$_2$Te$_4$ can be driven from an FM insulator to a Weyl semimetal with a minimum number of Weyl points, i.e., two Weyl points in systems without time-reversal symmetry. More importantly, in FM MnBi$_2$Te$_4$ with out-of-plane easy magnetization axis, we found that the band dispersion of the WP evolves from Type-II to Type-III and finally to Type-I when the light intensity increases. Moreover, we show that the profile of the characteristic Fermi arc of Weyl semimetal phase is sensitive to changes in light intensity, which enables efficient manipulation of the Fermi arc length of FM MnBi$_2$Te$_4$ in experiments. In addition, for FM MnBi$_2$Te$_4$ with in-plane easy magnetization axis, the system becomes a type I Weyl semimetal under CPL irradiation. With controllable band dispersion, length of Fermi arc, and minimum number of WPs, our results indicate that CPL-irradiated FM MnBi$_2$Te$_4$ is an ideal platform to study novel transport phenomena in Weyl semimetals with distinct band dispersion.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
Authors:
Tao Liu,
Feilong Chen,
Shuai Fan,
Chenpeng Du,
Qi Chen,
Xie Chen,
Kai Yu
Abstract:
The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a…
▽ More
The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the need for labeled data. Additionally, the integration of a diffusion model with a variance adapter allows for the generation of diverse and controllable facial animations. This method not only demonstrates AniTalker's capability to create detailed and realistic facial movements but also underscores its potential in crafting dynamic avatars for real-world applications. Synthetic results can be viewed at https://github.com/X-LANCE/AniTalker.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling
Authors:
Di Wu,
Shicai Fan,
Xue Zhou,
Li Yu,
Yuzhong Deng,
Jianxiao Zou,
Baihong Lin
Abstract:
Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image re…
▽ More
Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image reconstruction and unexpectedly suffer from low reconstruction quality. To address these issues, this paper proposes a novel and highly-interpretable method named Masked Diffusion Posterior Sampling (MDPS). In MDPS, the problem of normal image reconstruction is mathematically modeled as multiple diffusion posterior sampling for normal images based on the devised masked noisy observation model and the diffusion-based normal image prior under Bayesian framework. Using a metric designed from pixel-level and perceptual-level perspectives, MDPS can effectively compute the difference map between each normal posterior sample and the given test image. Anomaly scores are obtained by averaging all difference maps for multiple posterior samples. Exhaustive experiments on MVTec and BTAD datasets demonstrate that MDPS can achieve state-of-the-art performance in normal image reconstruction quality as well as anomaly detection and localization.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Efficient EndoNeRF Reconstruction and Its Application for Data-driven Surgical Simulation
Authors:
Yuehao Wang,
Bingchen Gong,
Yonghao Long,
Siu Hin Fan,
Qi Dou
Abstract:
The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate sha…
▽ More
The healthcare industry has a growing need for realistic modeling and efficient simulation of surgical scenes. With effective models of deformable surgical scenes, clinicians are able to conduct surgical planning and surgery training on scenarios close to real-world cases. However, a significant challenge in achieving such a goal is the scarcity of high-quality soft tissue models with accurate shapes and textures. To address this gap, we present a data-driven framework that leverages emerging neural radiance field technology to enable high-quality surgical reconstruction and explore its application for surgical simulations. We first focus on develo** a fast NeRF-based surgical scene 3D reconstruction approach that achieves state-of-the-art performance. This method can significantly outperform traditional 3D reconstruction methods, which have failed to capture large deformations and produce fine-grained shapes and textures. We then propose an automated creation pipeline of interactive surgical simulation environments through a closed mesh extraction algorithm. Our experiments have validated the superior performance and efficiency of our proposed approach in surgical scene 3D reconstruction. We further utilize our reconstructed soft tissues to conduct FEM and MPM simulations, showcasing the practical application of our method in data-driven surgical simulations.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Highly sensitive and efficient 1550 nm photodetector for room temperature operation
Authors:
Rituraj,
Zhi Gang Yu,
R. M. E. B. Kandegedara,
Shanhui Fan,
Srini Krishnamurthy
Abstract:
Photonic quantum technologies such as effective quantum communication require room temperature (RT) operating single- or few- photon sensors with high external quantum efficiency (EQE) at 1550 nm wavelength. The leading class of devices in this segment is avalanche photodetectors operating particularly in the Geiger mode. Often the requirements for RT operation and for a high EQE are in conflict,…
▽ More
Photonic quantum technologies such as effective quantum communication require room temperature (RT) operating single- or few- photon sensors with high external quantum efficiency (EQE) at 1550 nm wavelength. The leading class of devices in this segment is avalanche photodetectors operating particularly in the Geiger mode. Often the requirements for RT operation and for a high EQE are in conflict, resulting in a compromised solution. We have developed a device which employs a two-dimensional (2D) semiconductor material on a co-optimized dielectric photonic crystal substrate to simultaneously decrease the dark current by three orders of magnitude at RT and maintain an EQE of >99%. The device is amenable to avalanching and form a basis for single photon detection with ultra-low dark current and high photodetection efficiency. Harnessing the high carrier mobility of 2D materials, the device has ~ps jitter time and can be integrated into a large 2D array camera.
△ Less
Submitted 12 May, 2024; v1 submitted 20 March, 2024;
originally announced April 2024.
-
Dual Representation of Unbounded Dynamic Concave Utilities
Authors:
Shengjun Fan,
Ying Hu,
Shanjian Tang
Abstract:
In several linear spaces of possibly unbounded endowments, we represent the dynamic concave utilities (hence the dynamic convex risk measures) as the solutions of backward stochastic differential equations (BSDEs) with unbounded terminal values, with the help of our recent existence and uniqueness results on unbounded solutions of scalar BSDEs whose generators have a linear, super-linear, sub-quad…
▽ More
In several linear spaces of possibly unbounded endowments, we represent the dynamic concave utilities (hence the dynamic convex risk measures) as the solutions of backward stochastic differential equations (BSDEs) with unbounded terminal values, with the help of our recent existence and uniqueness results on unbounded solutions of scalar BSDEs whose generators have a linear, super-linear, sub-quadratic or quadratic growth. The Legendre-Fenchel transform (dual representation) of convex functions, the de la vallée-Poussin theorem, and Young's and Gronwall's inequalities constitute the main ingredients of these representation results.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
One-Shot Sequential Federated Learning for Non-IID Data by Enhancing Local Model Diversity
Authors:
Naibo Wang,
Yuchen Deng,
Wenjie Feng,
Shichen Fan,
Jianwei Yin,
See-Kiong Ng
Abstract:
Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL se…
▽ More
Traditional federated learning mainly focuses on parallel settings (PFL), which can suffer significant communication and computation costs. In contrast, one-shot and sequential federated learning (SFL) have emerged as innovative paradigms to alleviate these costs. However, the issue of non-IID (Independent and Identically Distributed) data persists as a significant challenge in one-shot and SFL settings, exacerbated by the restricted communication between clients. In this paper, we improve the one-shot sequential federated learning for non-IID data by proposing a local model diversity-enhancing strategy. Specifically, to leverage the potential of local model diversity for improving model performance, we introduce a local model pool for each client that comprises diverse models generated during local training, and propose two distance measurements to further enhance the model diversity and mitigate the effect of non-IID data. Consequently, our proposed framework can improve the global model performance while maintaining low communication costs. Extensive experiments demonstrate that our method exhibits superior performance to existing one-shot PFL methods and achieves better accuracy compared with state-of-the-art one-shot SFL methods on both label-skew and domain-shift tasks (e.g., 6%+ accuracy improvement on the CIFAR-10 dataset).
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Authors:
Yiwei Guo,
Chenrun Wang,
Yifan Yang,
Hankun Wang,
Ziyang Ma,
Chenpeng Du,
Shuai Wang,
Hanzheng Li,
Shuai Fan,
Hui Zhang,
Xie Chen,
Kai Yu
Abstract:
Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challen…
▽ More
Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS). In this paper, we describe the systems developed by the SJTU X-LANCE group for the TTS (acoustic + vocoder), SVS, and ASR tracks in the Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge. Notably, we achieved 1st rank on the leaderboard in the TTS track both with the whole training set and only 1h training data, with the highest UTMOS score and lowest bitrate among all submissions.
△ Less
Submitted 9 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsy Narratives
Authors:
Shuxian Fan,
Adam Visokay,
Kentaro Hoffman,
Stephen Salerno,
Li Liu,
Jeffrey T. Leek,
Tyler H. McCormick
Abstract:
In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii…
▽ More
In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent's COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in "prediction-powered inference" to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
End-to-End Autonomous Driving through V2X Cooperation
Authors:
Haibao Yu,
Wenxian Yang,
Jiaru Zhong,
Zhenwei Yang,
Siqi Fan,
** Luo,
Zaiqing Nie
Abstract:
Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pio…
▽ More
Cooperatively utilizing both ego-vehicle and infrastructure sensor data via V2X communication has emerged as a promising approach for advanced autonomous driving. However, current research mainly focuses on improving individual modules, rather than taking end-to-end learning to optimize final planning performance, resulting in underutilized data potential. In this paper, we introduce UniV2X, a pioneering cooperative autonomous driving framework that seamlessly integrates all key driving modules across diverse views into a unified network. We propose a sparse-dense hybrid data transmission and fusion mechanism for effective vehicle-infrastructure cooperation, offering three advantages: 1) Effective for simultaneously enhancing agent perception, online map**, and occupancy prediction, ultimately improving planning performance. 2) Transmission-friendly for practical and limited communication conditions. 3) Reliable data fusion with interpretability of this hybrid data. We implement UniV2X, as well as reproducing several benchmark methods, on the challenging DAIR-V2X, the real-world cooperative driving dataset. Experimental results demonstrate the effectiveness of UniV2X in significantly enhancing planning performance, as well as all intermediate output performance. Code is at https://github.com/AIR-THU/UniV2X.
△ Less
Submitted 19 April, 2024; v1 submitted 31 March, 2024;
originally announced April 2024.
-
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Authors:
Ming Yan,
Yan Zhang,
Shuqiang Cai,
Shuqi Fan,
Xincheng Lin,
Yudi Dai,
Siqi Shen,
Chenglu Wen,
Lan Xu,
Yuexin Ma,
Cheng Wang
Abstract:
Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding…
▽ More
Comprehensive capturing of human motions requires both accurate captures of complex poses and precise localization of the human within scenes. Most of the HPE datasets and methods primarily rely on RGB, LiDAR, or IMU data. However, solely using these modalities or a combination of them may not be adequate for HPE, particularly for complex and fast movements. For holistic human motion understanding, we present RELI11D, a high-quality multimodal human motion dataset involves LiDAR, IMU system, RGB camera, and Event camera. It records the motions of 10 actors performing 5 sports in 7 scenes, including 3.32 hours of synchronized LiDAR point clouds, IMU measurement data, RGB videos and Event steams. Through extensive experiments, we demonstrate that the RELI11D presents considerable challenges and opportunities as it contains many rapid and complex motions that require precise location. To address the challenge of integrating different modalities, we propose LEIR, a multimodal baseline that effectively utilizes LiDAR Point Cloud, Event stream, and RGB through our cross-attention fusion strategy. We show that LEIR exhibits promising results for rapid motions and daily motions and that utilizing the characteristics of multiple modalities can indeed improve HPE performance. Both the dataset and source code will be released publicly to the research community, fostering collaboration and enabling further exploration in this field.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Deep CSI Compression for Dual-Polarized Massive MIMO Channels with Disentangled Representation Learning
Authors:
Suhang Fan,
Wei Xu,
Renjie Xie,
Shi **,
Derrick Wing Kwan Ng,
Naofal Al-Dhahir
Abstract:
Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the…
▽ More
Channel state information (CSI) feedback is critical for achieving the promised advantages of enhancing spectral and energy efficiencies in massive multiple-input multiple-output (MIMO) wireless communication systems. Deep learning (DL)-based methods have been proven effective in reducing the required signaling overhead for CSI feedback. In practical dual-polarized MIMO scenarios, channels in the vertical and horizontal polarization directions tend to exhibit high polarization correlation. To fully exploit the inherent propagation similarity within dual-polarized channels, we propose a disentangled representation neural network (NN) for CSI feedback, referred to as DiReNet. The proposed DiReNet disentangles dual-polarized CSI into three components: polarization-shared information, vertical polarization-specific information, and horizontal polarization-specific information. This disentanglement of dual-polarized CSI enables the minimization of information redundancy caused by the polarization correlation and improves the performance of CSI compression and recovery. Additionally, flexible quantization and network extension schemes are designed. Consequently, our method provides a pragmatic solution for CSI feedback to harness the physical MIMO polarization as a priori information. Our experimental results show that the performance of our proposed DiReNet surpasses that of existing DL-based networks, while also effectively reducing the number of network parameters by nearly one third.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
Authors:
Hongshen Xu,
Zichen Zhu,
Situo Zhang,
Da Ma,
Shuai Fan,
Lu Chen,
Kai Yu
Abstract:
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduc…
▽ More
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduct a comprehensive examination of the role of rejection, introducing the notion of model reliability along with corresponding metrics. These metrics measure the model's ability to provide accurate responses while adeptly rejecting questions exceeding its knowledge boundaries, thereby minimizing hallucinations. To improve the inherent reliability of LLMs, we present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF). RLKF leverages knowledge feedback to dynamically determine the model's knowledge boundary and trains a reliable reward model to encourage the refusal of out-of-knowledge questions. Experimental results on mathematical questions affirm the substantial efficacy of RLKF in significantly enhancing LLM reliability.
△ Less
Submitted 7 April, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Strong coupling and single-photon nonlinearity in free-electron quantum optics
Authors:
Aviv Karnieli,
Charles Roques-Carmes,
Nicholas Rivera,
Shanhui Fan
Abstract:
The observation that free electrons can interact coherently with quantized electromagnetic fields and matter systems has led to a plethora of proposals leveraging the unique quantum properties of free electrons. At the heart of these proposals lies the assumption of a strong quantum interaction between a flying free electron and a photonic mode. However, existing schemes are intrinsically limited…
▽ More
The observation that free electrons can interact coherently with quantized electromagnetic fields and matter systems has led to a plethora of proposals leveraging the unique quantum properties of free electrons. At the heart of these proposals lies the assumption of a strong quantum interaction between a flying free electron and a photonic mode. However, existing schemes are intrinsically limited by electron diffraction, which puts an upper bound on the interaction length and therefore the quantum coupling strength. Here, we propose the use of "free-electron fibers'': effectively one-dimensional photonic systems where free electrons co-propagate with two guided modes. The first mode applies a ponderomotive trap to the free electron, effectively lifting the limitations due to electron diffraction. The second mode strongly couples to the guided free electron, with an enhanced coupling that is orders of magnitude larger than previous designs. Moreover, the extended interaction lengths enabled by our scheme allows for strong single-photon nonlinearities mediated by free electrons. We predict a few interesting observable quantum effects in our system, such as deterministic single-photon emission and complex, nonlinear multimode dynamics. Our proposal paves the way towards the realization of many anticipated effects in free-electron quantum optics, such as non-Gaussian light generation, deterministic single photon emission, and quantum gates controlled by free-electron--photon interactions.
△ Less
Submitted 1 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption
Authors:
Shengyu Fan,
Xianglong Deng,
Zhuoyu Tian,
Zhicheng Hu,
Liang Chang,
Rui Hou,
Dan Meng,
Mingzhe Zhang
Abstract:
Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators fac…
▽ More
Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators face challenges related to large on-chip memory and area. Additionally, FHE algorithms undergo rapid development, rendering the previous accelerator designs less perfectly adapted to the evolving landscape of optimized FHE applications. In this paper, we conducted a detailed analysis of existing applications with the new FHE method, making two key observations: 1) the bottleneck of FHE applications shifts from NTT to the inner-product operation, and 2) the optimal α of KeySwitch changes with the decrease in multiplicative level. Based on these observations, we designed an accelerator named Taiyi, which includes specific hardware for the inner-product operation and optimizes the NTT and BConv operations through algorithmic derivation. A comparative evaluation of Taiyi against previous state-of-the-art designs reveals an average performance improvement of 1.5x and reduces the area overhead by 15.7%.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Authors:
Ruiyang Hao,
Siqi Fan,
Yingru Dai,
Zhenlin Zhang,
Chenxi Li,
Yuntian Wang,
Haibao Yu,
Wenxian Yang,
Jirui Yuan,
Zaiqing Nie
Abstract:
The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and bl…
▽ More
The value of roadside perception, which could extend the boundaries of autonomous driving and traffic management, has gradually become more prominent and acknowledged in recent years. However, existing roadside perception approaches only focus on the single-infrastructure sensor system, which cannot realize a comprehensive understanding of a traffic area because of the limited sensing range and blind spots. Orienting high-quality roadside perception, we need Roadside Cooperative Perception (RCooper) to achieve practical area-coverage roadside perception for restricted traffic areas. Rcooper has its own domain-specific challenges, but further exploration is hindered due to the lack of datasets. We hence release the first real-world, large-scale RCooper dataset to bloom the research on practical roadside cooperative perception, including detection and tracking. The manually annotated dataset comprises 50k images and 30k point clouds, including two representative traffic scenes (i.e., intersection and corridor). The constructed benchmarks prove the effectiveness of roadside cooperation perception and demonstrate the direction of further research. Codes and dataset can be accessed at: https://github.com/AIR-THU/DAIR-RCooper.
△ Less
Submitted 31 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
Not all Layers of LLMs are Necessary during Inference
Authors:
Siqi Fan,
Xin Jiang,
Xiang Li,
Xuying Meng,
Peng Han,
Shuo Shang,
Aixin Sun,
Yequan Wang,
Zhongyuan Wang
Abstract:
The inference phase of Large Language Models (LLMs) is very expensive. An ideal inference stage of LLMs could utilize fewer computational resources while still maintaining its capabilities (e.g., generalization and in-context learning ability). In this paper, we try to answer the question, "During LLM inference, can we use shallow layers for easy instances; and deep layers for hard ones?" To answe…
▽ More
The inference phase of Large Language Models (LLMs) is very expensive. An ideal inference stage of LLMs could utilize fewer computational resources while still maintaining its capabilities (e.g., generalization and in-context learning ability). In this paper, we try to answer the question, "During LLM inference, can we use shallow layers for easy instances; and deep layers for hard ones?" To answer this question, we first indicate that Not all Layers are Necessary during Inference by statistically analyzing the activated layers across tasks. Then, we propose a simple algorithm named AdaInfer to determine the inference termination moment based on the input instance adaptively. More importantly, AdaInfer does not alter LLM parameters and maintains generalizability across tasks. Experiments on well-known LLMs (i.e., Llama2 series and OPT) show that AdaInfer saves an average of 14.8% of computational resources, even up to 50% on sentiment tasks, while maintaining comparable performance. Additionally, this method is orthogonal to other model acceleration techniques, potentially boosting inference efficiency further.
△ Less
Submitted 14 April, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
EMIFF: Enhanced Multi-scale Image Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection
Authors:
Zhe Wang,
Siqi Fan,
Xiaoliang Huo,
Tongda Xu,
Yan Wang,
**g**g Liu,
Yilun Chen,
Ya-Qin Zhang
Abstract:
In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, cause…
▽ More
In autonomous driving, cooperative perception makes use of multi-view cameras from both vehicles and infrastructure, providing a global vantage point with rich semantic context of road conditions beyond a single vehicle viewpoint. Currently, two major challenges persist in vehicle-infrastructure cooperative 3D (VIC3D) object detection: $1)$ inherent pose errors when fusing multi-view images, caused by time asynchrony across cameras; $2)$ information loss in transmission process resulted from limited communication bandwidth. To address these issues, we propose a novel camera-based 3D detection framework for VIC3D task, Enhanced Multi-scale Image Feature Fusion (EMIFF). To fully exploit holistic perspectives from both vehicles and infrastructure, we propose Multi-scale Cross Attention (MCA) and Camera-aware Channel Masking (CCM) modules to enhance infrastructure and vehicle features at scale, spatial, and channel levels to correct the pose error introduced by camera asynchrony. We also introduce a Feature Compression (FC) module with channel and spatial compression blocks for transmission efficiency. Experiments show that EMIFF achieves SOTA on DAIR-V2X-C datasets, significantly outperforming previous early-fusion and late-fusion methods with comparable transmission costs.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Random time horizon BSDEs with stochastic monotonicity and general growth generators and related PDEs
Authors:
Xinying Li,
Yaqi Zhang,
Shengjun Fan
Abstract:
This paper is devoted to solving a multidimensional backward stochastic differential equation (BSDE) with a general random terminal time, which may take values in [0,+infinity]. The generator g satisfies a stochastic monotonicity condition in the first unknown variable y and a stochastic Lipschitz continuity condition in the second unknown variable z, and it can have a more general growth with res…
▽ More
This paper is devoted to solving a multidimensional backward stochastic differential equation (BSDE) with a general random terminal time, which may take values in [0,+infinity]. The generator g satisfies a stochastic monotonicity condition in the first unknown variable y and a stochastic Lipschitz continuity condition in the second unknown variable z, and it can have a more general growth with respect to y than the classical one stated in (H5) of Briand et al. [2003]. Without imposing any restriction of finite moment on the stochastic coefficients, we establish a general existence and uniqueness result for the adapted solution of the previous BSDE in a proper weighted L2-space. This result is proved via some innovative ideas and delicate analytical techniques, and it unifies and strengthens many existing works on BSDEs with stochastic monotonicity generators, BSDEs with stochastic Lipschitz generators, and BSDEs with deterministic Lipschitz/monotonicity generators. Then, a continuous dependence property and a stability theorem for the weighted L2-solutions are given. We also derive the nonlinear Feynman-Kac formula for both parabolic and elliptic PDEs in this context.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Elementary excitations of single-photon emitters in hexagonal Boron Nitride
Authors:
Jonathan Pelliciari,
Enrique Mejia,
John M. Woods,
Yanhong Gu,
Jiemin Li,
Saroj B. Chand,
Shiyu Fan,
Kenji Watanabe,
Takashi Taniguchi,
Valentina Bisogni,
Gabriele Grosso
Abstract:
Single-photon emitters serve as building blocks for many emerging concepts in quantum photonics. The recent identification of bright, tunable, and stable emitters in hexagonal boron nitride (hBN) has opened the door to quantum platforms operating across the infrared to ultraviolet spectrum. While it is widely acknowledged that defects are responsible for single-photon emitters in hBN, crucial deta…
▽ More
Single-photon emitters serve as building blocks for many emerging concepts in quantum photonics. The recent identification of bright, tunable, and stable emitters in hexagonal boron nitride (hBN) has opened the door to quantum platforms operating across the infrared to ultraviolet spectrum. While it is widely acknowledged that defects are responsible for single-photon emitters in hBN, crucial details regarding their origin, electronic levels, and orbital involvement remain unknown. Here, we employ a combination of resonant inelastic X-ray scattering and photoluminescence spectroscopy in defective hBN unveiling an elementary excitation at 285 meV that gives rise to a plethora of harmonics correlated with single-photon emitters. We discuss the importance of N $π^*$ antibonding orbitals in sha** the electronic states of the emitters. The discovery of the elementary excitations of hBN provides new fundamental insights into quantum emission in low-dimensional materials, paving the way for future investigations in other platforms.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks
Authors:
Mengmei Zhang,
Mingwei Sun,
Peng Wang,
Shen Fan,
Yanhu Mo,
Xiaoxiao Xu,
Hong Liu,
Cheng Yang,
Chuan Shi
Abstract:
Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several…
▽ More
Large language models (LLMs) like ChatGPT, exhibit powerful zero-shot and instruction-following capabilities, have catalyzed a revolutionary transformation across diverse fields, especially for open-ended tasks. While the idea is less explored in the graph domain, despite the availability of numerous powerful graph models (GMs), they are restricted to tasks in a pre-defined form. Although several methods applying LLMs to graphs have been proposed, they fail to simultaneously handle the pre-defined and open-ended tasks, with LLM as a node feature enhancer or as a standalone predictor. To break this dilemma, we propose to bridge the pretrained GM and LLM by a Translator, named GraphTranslator, aiming to leverage GM to handle the pre-defined tasks effectively and utilize the extended interface of LLMs to offer various open-ended tasks for GM. To train such Translator, we propose a Producer capable of constructing the graph-text alignment data along node information, neighbor information and model information. By translating node representation into tokens, GraphTranslator empowers an LLM to make predictions based on language instructions, providing a unified perspective for both pre-defined and open-ended tasks. Extensive results demonstrate the effectiveness of our proposed GraphTranslator on zero-shot node classification. The graph question answering experiments reveal our GraphTranslator potential across a broad spectrum of open-ended tasks through language instructions. Our code is available at: https://github.com/alibaba/GraphTranslator.
△ Less
Submitted 27 February, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
CTGAN: Semantic-guided Conditional Texture Generator for 3D Shapes
Authors:
Yi-Ting Pan,
Chai-Rong Lee,
Shu-Ho Fan,
Jheng-Wei Su,
Jia-Bin Huang,
Yung-Yu Chuang,
Hung-Kuo Chu
Abstract:
The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guide…
▽ More
The entertainment industry relies on 3D visual content to create immersive experiences, but traditional methods for creating textured 3D models can be time-consuming and subjective. Generative networks such as StyleGAN have advanced image synthesis, but generating 3D objects with high-fidelity textures is still not well explored, and existing methods have limitations. We propose the Semantic-guided Conditional Texture Generator (CTGAN), producing high-quality textures for 3D shapes that are consistent with the viewing angle while respecting shape semantics. CTGAN utilizes the disentangled nature of StyleGAN to finely manipulate the input latent codes, enabling explicit control over both the style and structure of the generated textures. A coarse-to-fine encoder architecture is introduced to enhance control over the structure of the resulting textures via input segmentation. Experimental results show that CTGAN outperforms existing methods on multiple quality metrics and achieves state-of-the-art performance on texture generation in both conditional and unconditional settings.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Measuring, processing, and generating partially coherent light with self-configuring optics
Authors:
Charles Roques-Carmes,
Shanhui Fan,
David Miller
Abstract:
Optical phenomena always display some degree of partial coherence between their respective degrees of freedom. Partial coherence is of particular interest in multimodal systems, where classical and quantum correlations between spatial, polarization, and spectral degrees of freedom can lead to fascinating phenomena (e.g., entanglement) and be leveraged for advanced imaging and sensing modalities (e…
▽ More
Optical phenomena always display some degree of partial coherence between their respective degrees of freedom. Partial coherence is of particular interest in multimodal systems, where classical and quantum correlations between spatial, polarization, and spectral degrees of freedom can lead to fascinating phenomena (e.g., entanglement) and be leveraged for advanced imaging and sensing modalities (e.g., in hyperspectral, polarization, and ghost imaging). Here, we present a universal method to analyze, process, and generate spatially partially coherent light in multimode systems by using self-configuring optical networks. Our method relies on cascaded self-configuring layers whose average power outputs are sequentially optimized. Once optimized, the network separates the input light into its mutually incoherent components, which is formally equivalent to a diagonalization of the input density matrix. We illustrate our method with arrays of Mach-Zehnder interferometers and show how this method can be used to perform partially coherent environmental light sensing, generation of multimode partially coherent light with arbitrary coherency matrices, and unscrambling of quantum optical mixtures. We provide guidelines for the experimental realization of this method, paving the way for self-configuring photonic devices that can automatically learn optimal modal representations of partially coherent light fields.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Deciphering Pluto's Haze: How a Solar-Powered Vapor-Pressure Plume Shapes Its Bimodal Particle Size Distribution
Authors:
Sihe Chen,
Danica Adams,
Siteng Fan,
Peter Gao,
Eliot Young,
Yuk Yung
Abstract:
Combining findings from New Horizons' suite of instruments reveals a bimodal haze particle distribution within Pluto's atmosphere, which haze models have not been able to reproduce. We employ the photochemical and microphysics KINAERO model to simulate seasonal cycles and their impact on the haze distribution. We find that the smaller spherical particle mode can be generated through photochemistry…
▽ More
Combining findings from New Horizons' suite of instruments reveals a bimodal haze particle distribution within Pluto's atmosphere, which haze models have not been able to reproduce. We employ the photochemical and microphysics KINAERO model to simulate seasonal cycles and their impact on the haze distribution. We find that the smaller spherical particle mode can be generated through photochemistry and coagulation, while the larger aggregate mode are formed by surface volatile deposits sublimating and subsequently lofting such particles upwards.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Existence, uniqueness and comparison theorem on unbounded solutions of general time interval BSDEs with sub-quadratic generators
Authors:
Chuang Gu,
Yan Wang,
Shengjun Fan
Abstract:
This paper is devoted to the existence, uniqueness and comparison theorem on unbounded solutions of one-dimensional backward stochastic differential equations (BSDEs) with sub-quadratic generators, where the terminal time is allowed to be finite or infinite. We first establish existence of the unbounded solutions for this kind of BSDEs with generator $g$ satisfying a time-varying one-sided linear…
▽ More
This paper is devoted to the existence, uniqueness and comparison theorem on unbounded solutions of one-dimensional backward stochastic differential equations (BSDEs) with sub-quadratic generators, where the terminal time is allowed to be finite or infinite. We first establish existence of the unbounded solutions for this kind of BSDEs with generator $g$ satisfying a time-varying one-sided linear growth in the first unknown variable $y$ and a time-varying sub-quadratic growth in the second unknown variable $z$. Then, the uniqueness and comparison theorem of the unbounded solutions for this kind of BSDEs are proved under a time-varying extended convexity assumption. These results generalized those obtained in \cite{12} to the general time interval BSDEs. Finally, several sufficient conditions ensuring that the uniqueness holds are put forward and verified via some innovative ideas, which are explored at the first time even though for the case of finite time interval BSDEs.
△ Less
Submitted 9 June, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
On the existence and uniqueness of unbounded solutions to quadratic BSDEs with monotonic-convex generators
Authors:
Yan Wang,
Xinying Li,
Chuang Gu,
Shengjun Fan
Abstract:
With the terminal value $ξ^-$ admitting a certain exponential moment and $ξ^+$ admitting every exponential moments or being bounded, we establish several existence and uniqueness results for unbounded solutions of backward stochastic differential equations (BSDEs) whose generator $g$ satisfies a monotonicity condition with general growth in the first unknown variable $y$ and a convexity condition…
▽ More
With the terminal value $ξ^-$ admitting a certain exponential moment and $ξ^+$ admitting every exponential moments or being bounded, we establish several existence and uniqueness results for unbounded solutions of backward stochastic differential equations (BSDEs) whose generator $g$ satisfies a monotonicity condition with general growth in the first unknown variable $y$ and a convexity condition with quadratic growth in the second unknown variable $z$. In particular, the generator $g$ may be not locally-Lipschitz continuous in $y$. This generalizes some results reported in \cite{Delbaen 2011} by relaxing the continuity and growth of $g$ in $y$. We also give an explicit expression of the first process in the unique unbounded solution of a BSDE when the generator $g$ is jointly convex in $(y,z)$ and has a linear growth in $y$ and a quadratic growth in $z$. Finally, we put forward the corresponding comparison theorems for unbounded solutions of the preceding BSDEs. These results are proved by those existing ideas and some innovative ones.
△ Less
Submitted 5 April, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Graph Fairness Learning under Distribution Shifts
Authors:
Yibo Li,
Xiao Wang,
Yujie Xing,
Shaohua Fan,
Ruijia Wang,
Yaoqi Liu,
Chuan Shi
Abstract:
Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are…
▽ More
Graph neural networks (GNNs) have achieved remarkable performance on graph-structured data. However, GNNs may inherit prejudice from the training data and make discriminatory predictions based on sensitive attributes, such as gender and race. Recently, there has been an increasing interest in ensuring fairness on GNNs, but all of them are under the assumption that the training and testing data are under the same distribution, i.e., training data and testing data are from the same graph. Will graph fairness performance decrease under distribution shifts? How does distribution shifts affect graph fairness learning? All these open questions are largely unexplored from a theoretical perspective. To answer these questions, we first theoretically identify the factors that determine bias on a graph. Subsequently, we explore the factors influencing fairness on testing graphs, with a noteworthy factor being the representation distances of certain groups between the training and testing graph. Motivated by our theoretical analysis, we propose our framework FatraGNN. Specifically, to guarantee fairness performance on unknown testing graphs, we propose a graph generator to produce numerous graphs with significant bias and under different distributions. Then we minimize the representation distances for each certain group between the training graph and generated graphs. This empowers our model to achieve high classification and fairness performance even on generated graphs with significant bias, thereby effectively handling unknown testing graphs. Experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of our model in terms of both accuracy and fairness.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
ChemDFM: Dialogue Foundation Model for Chemistry
Authors:
Zihan Zhao,
Da Ma,
Lu Chen,
Liangtai Sun,
Zihao Li,
Hongshen Xu,
Zichen Zhu,
Su Zhu,
Shuai Fan,
Guodong Shen,
Xin Chen,
Kai Yu
Abstract:
Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly i…
▽ More
Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative SMILES notation, hinders the performance of general-domain LLMs in chemistry. To this end, we develop ChemDFM, the first LLM towards CGI. ChemDFM-13B is trained on 34B tokens from chemical literature, textbooks, and instructions as well as various data from the general domain. Therefore, it can store, understand, and reason over chemical knowledge and languages while still possessing advanced free-form language comprehension capabilities. Extensive quantitative evaluation shows that ChemDFM can significantly outperform the representative open-sourced LLMs. Moreover, ChemDFM can also surpass GPT-4 on a great portion of chemical tasks, despite the significant size difference. Further qualitative evaluations demonstrate the efficiency and effectiveness of ChemDFM in real-world research scenarios. We will open-source the ChemDFM model soon.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Graph Contrastive Invariant Learning from the Causal Perspective
Authors:
Yanhu Mo,
Xiao Wang,
Shaohua Fan,
Chuan Shi
Abstract:
Graph contrastive learning (GCL), learning the node representation by contrasting two augmented graphs in a self-supervised way, has attracted considerable attention. GCL is usually believed to learn the invariant representation. However, does this understanding always hold in practice? In this paper, we first study GCL from the perspective of causality. By analyzing GCL with the structural causal…
▽ More
Graph contrastive learning (GCL), learning the node representation by contrasting two augmented graphs in a self-supervised way, has attracted considerable attention. GCL is usually believed to learn the invariant representation. However, does this understanding always hold in practice? In this paper, we first study GCL from the perspective of causality. By analyzing GCL with the structural causal model (SCM), we discover that traditional GCL may not well learn the invariant representations due to the non-causal information contained in the graph. How can we fix it and encourage the current GCL to learn better invariant representations? The SCM offers two requirements and motives us to propose a novel GCL method. Particularly, we introduce the spectral graph augmentation to simulate the intervention upon non-causal factors. Then we design the invariance objective and independence objective to better capture the causal factors. Specifically, (i) the invariance objective encourages the encoder to capture the invariant information contained in causal variables, and (ii) the independence objective aims to reduce the influence of confounders on the causal variables. Experimental results demonstrate the effectiveness of our approach on node classification tasks.
△ Less
Submitted 7 March, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
One-dimensional non-Hermitian band structures as Riemann surfaces
Authors:
Heming Wang,
Lingling Fan,
Shanhui Fan
Abstract:
We present the viewpoint of treating one-dimensional band structures as Riemann surfaces, linking the unique properties of non-Hermiticity to the geometry and topology of the Riemann surface. Branch cuts and branch points play a significant role when this viewpoint is applied to both the open-boundary spectrum and the braiding structure. An open-boundary spectrum is interpreted as branch cuts conn…
▽ More
We present the viewpoint of treating one-dimensional band structures as Riemann surfaces, linking the unique properties of non-Hermiticity to the geometry and topology of the Riemann surface. Branch cuts and branch points play a significant role when this viewpoint is applied to both the open-boundary spectrum and the braiding structure. An open-boundary spectrum is interpreted as branch cuts connecting certain branch points, and its consistency with the monodromy representation severely limits its possible morphology. A braid word for the Brillouin zone can be read off from its intersections with branch cuts, and its crossing number is given by the winding number of the discriminant. These results open new avenues to generate important insights into the physical behaviors of non-Hermitian systems.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
Shifted-prime divisors
Authors:
Steve Fan,
Carl Pomerance
Abstract:
Let $ω^*(n)$ denote the number of divisors of $n$ that are shifted primes, that is, the number of divisors of $n$ of the form $p-1$, with $p$ prime. Studied by Prachar in an influential paper from 70 years ago, the higher moments of $ω^*(n)$ are still somewhat a mystery. This paper addresses these higher moments and considers other related problems.
Let $ω^*(n)$ denote the number of divisors of $n$ that are shifted primes, that is, the number of divisors of $n$ of the form $p-1$, with $p$ prime. Studied by Prachar in an influential paper from 70 years ago, the higher moments of $ω^*(n)$ are still somewhat a mystery. This paper addresses these higher moments and considers other related problems.
△ Less
Submitted 19 March, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.