-
Knowledge Graph Driven Recommendation System Algorithm
Authors:
Chaoyang Zhang,
Yanan Li,
Shen Chen,
Siwei Fan,
Wei Li
Abstract:
In this paper, we propose a novel graph neural network-based recommendation model called KGLN, which leverages Knowledge Graph (KG) information to enhance the accuracy and effectiveness of personalized recommendations. We first use a single-layer neural network to merge individual node features in the graph, and then adjust the aggregation weights of neighboring entities by incorporating influence…
▽ More
In this paper, we propose a novel graph neural network-based recommendation model called KGLN, which leverages Knowledge Graph (KG) information to enhance the accuracy and effectiveness of personalized recommendations. We first use a single-layer neural network to merge individual node features in the graph, and then adjust the aggregation weights of neighboring entities by incorporating influence factors. The model evolves from a single layer to multiple layers through iteration, enabling entities to access extensive multi-order associated entity information. The final step involves integrating features of entities and users to produce a recommendation score. The model performance was evaluated by comparing its effects on various aggregation methods and influence factors. In tests over the MovieLen-1M and Book-Crossing datasets, KGLN shows an Area Under the ROC curve (AUC) improvement of 0.3% to 5.9% and 1.1% to 8.2%, respectively, which is better than existing benchmark methods like LibFM, DeepFM, Wide&Deep, and RippleNet.
△ Less
Submitted 3 February, 2024; v1 submitted 1 December, 2023;
originally announced January 2024.
-
Uncertainty-aware No-Reference Point Cloud Quality Assessment
Authors:
Songlin Fan,
Zixuan Guo,
Wei Gao,
Ge Li
Abstract:
The evolution of compression and enhancement algorithms necessitates an accurate quality assessment for point clouds. Previous works consistently regard point cloud quality assessment (PCQA) as a MOS regression problem and devise a deterministic map**, ignoring the stochasticity in generating MOS from subjective tests. Besides, the viewpoint switching of 3D point clouds in subjective tests reinf…
▽ More
The evolution of compression and enhancement algorithms necessitates an accurate quality assessment for point clouds. Previous works consistently regard point cloud quality assessment (PCQA) as a MOS regression problem and devise a deterministic map**, ignoring the stochasticity in generating MOS from subjective tests. Besides, the viewpoint switching of 3D point clouds in subjective tests reinforces the judging stochasticity of different subjects compared with traditional images. This work presents the first probabilistic architecture for no-reference PCQA, motivated by the labeling process of existing datasets. The proposed method can model the quality judging stochasticity of subjects through a tailored conditional variational autoencoder (CVAE) and produces multiple intermediate quality ratings. These intermediate ratings simulate the judgments from different subjects and are then integrated into an accurate quality prediction, mimicking the generation process of a ground truth MOS. Specifically, our method incorporates a Prior Module, a Posterior Module, and a Quality Rating Generator, where the former two modules are introduced to model the judging stochasticity in subjective tests, while the latter is developed to generate diverse quality ratings. Extensive experiments indicate that our approach outperforms previous cutting-edge methods by a large margin and exhibits gratifying cross-dataset robustness.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Expanding property and statistical laws for $p$-adic subhyperbolic rational maps
Authors:
Shilei Fan,
Lingmin Liao,
Hongming Nie,
Yuefei Wang
Abstract:
Let $K$ be a finite extension of the field $\mathbb{Q}_p$ of $p$-adic numbers. A rational map $φ\in K(z)$ of degree at least $2$ is subhyperbolic if each critical point in the $\mathbb{C}_p$-Julia set of $φ$ is eventually periodic. We show that subhyperbolic maps in $K(z)$ exhibit expanding property with respect to some (singular) metric. As an application, under a mild assumption, we establish se…
▽ More
Let $K$ be a finite extension of the field $\mathbb{Q}_p$ of $p$-adic numbers. A rational map $φ\in K(z)$ of degree at least $2$ is subhyperbolic if each critical point in the $\mathbb{C}_p$-Julia set of $φ$ is eventually periodic. We show that subhyperbolic maps in $K(z)$ exhibit expanding property with respect to some (singular) metric. As an application, under a mild assumption, we establish several statistical laws for such maps in $K(z)$ with compact $\mathbb{C}_p$-Julia sets.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
$p$-adic rational maps having empty Fatou set
Authors:
Aihua Fan,
Shilei Fan,
Yahia Mwanis,
Yuefei Wang
Abstract:
On any finite algebraic extension $K$ of the field $\Q_p$ of $p$-adic numbers, there exist rational maps $φ\in K(z)$ such that dynamical system $(\mathbb{P}^{1}(K),φ)$ has empty Fatou set, i.e. the iteration family $\{φ^n: n\geq 0\}$ is nowhere equicontinuous.
On any finite algebraic extension $K$ of the field $\Q_p$ of $p$-adic numbers, there exist rational maps $φ\in K(z)$ such that dynamical system $(\mathbb{P}^{1}(K),φ)$ has empty Fatou set, i.e. the iteration family $\{φ^n: n\geq 0\}$ is nowhere equicontinuous.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Pre-insertion resistors temperature prediction based on improved WOA-SVR
Authors:
Honghe Dai,
Site Mo,
Haoxin Wang,
Nan Yin,
Songhai Fan,
Bixiong Li
Abstract:
The pre-insertion resistors (PIR) within high-voltage circuit breakers are critical components and warm up by generating Joule heat when an electric current flows through them. Elevated temperature can lead to temporary closure failure and, in severe cases, the rupture of PIR. To accurately predict the temperature of PIR, this study combines finite element simulation techniques with Support Vector…
▽ More
The pre-insertion resistors (PIR) within high-voltage circuit breakers are critical components and warm up by generating Joule heat when an electric current flows through them. Elevated temperature can lead to temporary closure failure and, in severe cases, the rupture of PIR. To accurately predict the temperature of PIR, this study combines finite element simulation techniques with Support Vector Regression (SVR) optimized by an Improved Whale Optimization Algorithm (IWOA) approach. The IWOA includes Tent map**, a convergence factor based on the sigmoid function, and the Ornstein-Uhlenbeck variation strategy. The IWOA-SVR model is compared with the SSA-SVR and WOA-SVR. The results reveal that the prediction accuracies of the IWOA-SVR model were 90.2% and 81.5% (above 100$^\circ$C) in the 3$^\circ$C temperature deviation range and 96.3% and 93.4% (above 100$^\circ$C) in the 4$^\circ$C temperature deviation range, surpassing the performance of the comparative models. This research demonstrates the method proposed can realize the online monitoring of the temperature of the PIR, which can effectively prevent thermal faults PIR and provide a basis for the opening and closing of the circuit breaker within a short period.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation
Authors:
Quan Tu,
Shilong Fan,
Zihang Tian,
Rui Yan
Abstract:
Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA…
▽ More
Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. The dataset comprises 1,785 multi-turn role-playing dialogues, encompassing 23,020 examples and featuring 77 characters derived from Chinese novels and scripts. It was carefully constructed, beginning with initial dialogue extraction via GPT-4, followed by rigorous human-led quality control, and enhanced with in-depth character profiles sourced from Baidu Baike. CharacterEval employs a multifaceted evaluation approach, encompassing thirteen targeted metrics on four dimensions. Comprehensive experiments on CharacterEval demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in Chinese role-playing conversation. Source code, data source and reward model will be publicly accessible at https://github.com/morecry/CharacterEval.
△ Less
Submitted 9 January, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
On-Chip Multidimensional Dynamic Control of Twisted Moiré Photonic Crystal for Smart Sensing and Imaging
Authors:
Haoning Tang,
Beicheng Lou,
Fan Du,
Guangqi Gao,
Mingjie Zhang,
Xueqi Ni,
Evelyn Hu,
Amir Yacoby,
Yuan Cao,
Shanhui Fan,
Eric Mazur
Abstract:
Reconfigurable optics, optical systems that have a dynamically tunable configuration, are emerging as a new frontier in photonics research. Recently, twisted moiré photonic crystal has become a competitive candidate for implementing reconfigurable optics because of its high degree of tunability. However, despite its great potential as versatile optics components, simultaneous and dynamic modulatio…
▽ More
Reconfigurable optics, optical systems that have a dynamically tunable configuration, are emerging as a new frontier in photonics research. Recently, twisted moiré photonic crystal has become a competitive candidate for implementing reconfigurable optics because of its high degree of tunability. However, despite its great potential as versatile optics components, simultaneous and dynamic modulation of multiple degrees of freedom in twisted moiré photonic crystal has remained out of reach, severely limiting its area of application. In this paper, we present a MEMS-integrated twisted moiré photonic crystal sensor that offers precise control over the interlayer gap and twist angle between two photonic crystal layers, and demonstrate an active twisted moiré photonic crystal-based optical sensor that can simultaneously resolve wavelength and polarization. Leveraging twist- and gap-tuned resonance modes, we achieve high-accuracy spectropolarimetric reconstruction of light using an adaptive sensing algorithm over a broad operational bandwidth in the telecom range and full Poincaré sphere. Our research showcases the remarkable capabilities of multidimensional control over emergent degrees of freedom in reconfigurable nanophotonics platforms and establishes a scalable pathway towards creating comprehensive flat-optics devices suitable for versatile light manipulation and information processing tasks.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Dance of Channel and Sequence: An Efficient Attention-Based Approach for Multivariate Time Series Forecasting
Authors:
Haoxin Wang,
Yipeng Mo,
Nan Yin,
Honghe Dai,
Bixiong Li,
Songhai Fan,
Site Mo
Abstract:
In recent developments, predictive models for multivariate time series analysis have exhibited commendable performance through the adoption of the prevalent principle of channel independence. Nevertheless, it is imperative to acknowledge the intricate interplay among channels, which fundamentally influences the outcomes of multivariate predictions. Consequently, the notion of channel independence,…
▽ More
In recent developments, predictive models for multivariate time series analysis have exhibited commendable performance through the adoption of the prevalent principle of channel independence. Nevertheless, it is imperative to acknowledge the intricate interplay among channels, which fundamentally influences the outcomes of multivariate predictions. Consequently, the notion of channel independence, while offering utility to a certain extent, becomes increasingly impractical, leading to information degradation. In response to this pressing concern, we present CSformer, an innovative framework characterized by a meticulously engineered two-stage self-attention mechanism. This mechanism is purposefully designed to enable the segregated extraction of sequence-specific and channel-specific information, while sharing parameters to promote synergy and mutual reinforcement between sequences and channels. Simultaneously, we introduce sequence adapters and channel adapters, ensuring the model's ability to discern salient features across various dimensions. Rigorous experimentation, spanning multiple real-world datasets, underscores the robustness of our approach, consistently establishing its position at the forefront of predictive performance across all datasets. This augmentation substantially enhances the capacity for feature extraction inherent to multivariate time series data, facilitating a more comprehensive exploitation of the available information.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Dig-CSI: A Distributed and Generative Model Assisted CSI Feedback Training Framework
Authors:
Zhilin Du,
Haozhen Li,
Zhenyu Liu,
Shilong Fan,
Xinyu Gu,
Lin Zhang
Abstract:
The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called…
▽ More
The advent of deep learning (DL)-based models has significantly advanced Channel State Information (CSI) feedback mechanisms in wireless communication systems. However, traditional approaches often suffer from high communication overhead and potential privacy risks due to the centralized nature of CSI data processing. To address these challenges, we design a CSI feedback training framework called Dig-CSI, in which the dataset for training the CSI feedback model is produced by the distributed generators uploaded by each user equipment (UE), but not through local data upload. Each UE trains an autoencoder, where the decoder is considered as the distributed generator, with local data to gain reconstruction accuracy and the ability to generate. Experimental results show that Dig-CSI can train a global CSI feedback model with comparable performance to the model trained with classical centralized learning with a much lighter communication overhead.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability
Authors:
Linze Li,
Sunqi Fan,
Hengjun Pu,
Zhaodong Bing,
Yao Tang,
Tianzhu Ye,
Tong Yang,
Liangyu Chen,
Jiajun Liang
Abstract:
Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and edit…
▽ More
Over recent years, diffusion models have facilitated significant advancements in video generation. Yet, the creation of face-related videos still confronts issues such as low facial fidelity, lack of frame consistency, limited editability and uncontrollable human poses. To address these challenges, we introduce a facial animation generation method that enhances both face identity fidelity and editing capabilities while ensuring frame consistency. This approach incorporates the concept of an anchor frame to counteract the degradation of generative ability in original text-to-image models when incorporating a motion module. We propose two strategies towards this objective: training-free and training-based anchor frame methods. Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models, delivering substantial improvements over the original outcomes in terms of facial fidelity, text-to-image editability, and video motion. Moreover, we introduce conditional control using a 3D parametric face model to capture accurate facial movements and expressions. This solution augments the creative possibilities for facial animation generation through the integration of multiple control signals. For additional samples, please visit https://paper-faac.github.io/.
△ Less
Submitted 20 December, 2023; v1 submitted 5 December, 2023;
originally announced December 2023.
-
Estimating Coronal Mass Ejection Mass and Kinetic Energy by Fusion of Multiple Deep-learning Models
Authors:
Khalid A. Alobaid,
Yasser Abduallah,
Jason T. L. Wang,
Haimin Wang,
Shen Fan,
Jialiang Li,
Huseyin Cavus,
Vasyl Yurchyshyn
Abstract:
Coronal mass ejections (CMEs) are massive solar eruptions, which have a significant impact on Earth. In this paper, we propose a new method, called DeepCME, to estimate two properties of CMEs, namely, CME mass and kinetic energy. Being able to estimate these properties helps better understand CME dynamics. Our study is based on the CME catalog maintained at the Coordinated Data Analysis Workshops…
▽ More
Coronal mass ejections (CMEs) are massive solar eruptions, which have a significant impact on Earth. In this paper, we propose a new method, called DeepCME, to estimate two properties of CMEs, namely, CME mass and kinetic energy. Being able to estimate these properties helps better understand CME dynamics. Our study is based on the CME catalog maintained at the Coordinated Data Analysis Workshops (CDAW) Data Center, which contains all CMEs manually identified since 1996 using the Large Angle and Spectrometric Coronagraph (LASCO) on board the Solar and Heliospheric Observatory (SOHO). We use LASCO C2 data in the period between January 1996 and December 2020 to train, validate and test DeepCME through 10-fold cross validation. The DeepCME method is a fusion of three deep learning models, including ResNet, InceptionNet, and InceptionResNet. Our fusion model extracts features from LASCO C2 images, effectively combining the learning capabilities of the three component models to jointly estimate the mass and kinetic energy of CMEs. Experimental results show that the fusion model yields a mean relative error (MRE) of 0.013 (0.009, respectively) compared to the MRE of 0.019 (0.017, respectively) of the best component model InceptionResNet (InceptionNet, respectively) in estimating the CME mass (kinetic energy, respectively). To our knowledge, this is the first time that deep learning has been used for CME mass and kinetic energy estimations.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence
Authors:
Yajie Liu,
Pu Ge,
Haoxiang Ma,
Shichao Fan,
Qingjie Liu,
Di Huang,
Yunhong Wang
Abstract:
Referring image segmentation (RIS) aims to segment objects in an image conditioning on free-from text descriptions. Despite the overwhelming progress, it still remains challenging for current approaches to perform well on cases with various text expressions or with unseen visual entities, limiting its further application. In this paper, we present a novel RIS approach, which substantially improves…
▽ More
Referring image segmentation (RIS) aims to segment objects in an image conditioning on free-from text descriptions. Despite the overwhelming progress, it still remains challenging for current approaches to perform well on cases with various text expressions or with unseen visual entities, limiting its further application. In this paper, we present a novel RIS approach, which substantially improves the generalization ability by addressing the two dilemmas mentioned above. Specially, to deal with unconstrained texts, we propose to boost a given expression with an explicit and crucial prompt, which complements the expression in a unified context, facilitating target capturing in the presence of linguistic style changes. Furthermore, we introduce a multi-modal fusion aggregation module with visual guidance from a powerful pretrained model to leverage spatial relations and pixel coherences to handle the incomplete target masks and false positive irregular clumps which often appear on unseen visual entities. Extensive experiments are conducted in the zero-shot cross-dataset settings and the proposed approach achieves consistent gains compared to the state-of-the-art, e.g., 4.15\%, 5.45\%, and 4.64\% mIoU increase on RefCOCO, RefCOCO+ and ReferIt respectively, demonstrating its effectiveness. Additionally, the results on GraspNet-RIS show that our approach also generalizes well to new scenarios with large domain shifts.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Authors:
Zeming Chen,
Alejandro Hernández Cano,
Angelika Romanou,
Antoine Bonnet,
Kyle Matoba,
Francesco Salvi,
Matteo Pagliardini,
Simin Fan,
Andreas Köpf,
Amirkeivan Mohtashami,
Alexandre Sallinen,
Alireza Sakhaeirad,
Vinitra Swamy,
Igor Krawczuk,
Deniz Bayazit,
Axel Marmet,
Syrielle Montariol,
Mary-Anne Hartley,
Martin Jaggi,
Antoine Bosselut
Abstract:
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele…
▽ More
Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Trace-enabled Timing Model Synthesis for ROS2-based Autonomous Applications
Authors:
Hazem Abaza,
Debayan Roy,
Shiqing Fan,
Selma Saidi,
Antonios Motakis
Abstract:
Autonomous applications are typically developed over Robot Operating System 2.0 (ROS2) even in time-critical systems like automotive. Recent years have seen increased interest in develo** model-based timing analysis and schedule optimization approaches for ROS2-based applications. To complement these approaches, we propose a tracing and measurement framework to obtain timing models of ROS2-based…
▽ More
Autonomous applications are typically developed over Robot Operating System 2.0 (ROS2) even in time-critical systems like automotive. Recent years have seen increased interest in develo** model-based timing analysis and schedule optimization approaches for ROS2-based applications. To complement these approaches, we propose a tracing and measurement framework to obtain timing models of ROS2-based applications. It offers a tracer based on extended Berkeley Packet Filter (eBPF) that probes different functions in ROS2 middleware and reads their arguments or return values to reason about the data flow in applications. It combines event traces from ROS2 and the operating system to generate a directed acyclic graph showing ROS2 callbacks, precedence relations between them, and their timing attributes. While being compatible with existing analyses, we also show how to model (i)~message synchronization, e.g., in sensor fusion, and (ii)~service requests from multiple clients, e.g., in motion planning. Considering that, in real-world scenarios, the application code might be confidential and formal models are unavailable, our framework still enables the application of existing analysis and optimization techniques.
△ Less
Submitted 23 November, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
On-chip multi-degree-of-freedom control of two-dimensional quantum and nonlinear materials
Authors:
Haoning Tang,
Yiting Wang,
Xueqi Ni,
Kenji Watanabe,
Takashi Taniguchi,
Pablo Jarillo-Herrero,
Shanhui Fan,
Eric Mazur,
Amir Yacoby,
Yuan Cao
Abstract:
Two-dimensional materials (2DM) and their derived heterostructures have electrical and optical properties that are widely tunable via several approaches, most notably electrostatic gating and interfacial engineering such as twisting. While electrostatic gating is simple and has been ubiquitously employed on 2DM, being able to tailor the interfacial properties in a similar real-time manner represen…
▽ More
Two-dimensional materials (2DM) and their derived heterostructures have electrical and optical properties that are widely tunable via several approaches, most notably electrostatic gating and interfacial engineering such as twisting. While electrostatic gating is simple and has been ubiquitously employed on 2DM, being able to tailor the interfacial properties in a similar real-time manner represents the next leap in our ability to modulate the underlying physics and build exotic devices with 2DM. However, all existing approaches rely on external machinery such as scanning microscopes, which often limit their scope of applications, and there is currently no means of tuning a 2D interface that has the same accessibility and scalability as electrostatic gating. Here, we demonstrate the first on-chip platform designed for 2D materials with in situ tunable interfacial properties, utilizing a microelectromechanical system (MEMS). Each compact, cost-effective, and versatile device is a standalone micromachine that allows voltage-controlled approaching, twisting, and pressurizing of 2DM with high accuracy. As a demonstration, we engineer synthetic topological singularities, known as merons, in the nonlinear optical susceptibility of twisted hexagonal boron nitride (h-BN), via simultaneous control of twist angle and interlayer separation. The chirality of the resulting moire pattern further induces a strong circular dichroism in the second-harmonic generation. A potential application of this topological nonlinear susceptibility is to create integrated classical and quantum light sources that have widely and real-time tunable polarization. Our invention pushes the boundary of available technologies for manipulating low-dimensional quantum materials, which in turn opens up the gateway for designing future hybrid 2D-3D devices for condensed-matter physics, quantum optics, and beyond.
△ Less
Submitted 14 June, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Three Dimensional Reconfigurable Optical Singularities in Bilayer Photonic Crystals
Authors:
Xueqi Ni,
Yuan Liu,
Beicheng Lou,
Mingjie Zhang,
Evelyn L. Hu,
Shanhui Fan,
Eric Mazur,
Haoning Tang
Abstract:
Metasurfaces and photonic crystals have revolutionized classical and quantum manipulation of light, and opened the door to studying various optical singularities related to phases and polarization states. However, traditional nanophotonic devices lack reconfigurability, hindering the dynamic switching and optimization of optical singularities. This paper delves into the underexplored concept of tu…
▽ More
Metasurfaces and photonic crystals have revolutionized classical and quantum manipulation of light, and opened the door to studying various optical singularities related to phases and polarization states. However, traditional nanophotonic devices lack reconfigurability, hindering the dynamic switching and optimization of optical singularities. This paper delves into the underexplored concept of tunable bilayer photonic crystals (BPhCs), which offer rich interlayer coupling effects. Utilizing silicon nitride-based BPhCs, we demonstrate tunable bidirectional and unidirectional polarization singularities, along with spatiotemporal phase singularities. Leveraging these tunable singularities, we achieve dynamic modulation of bound-state-in-continuum states, unidirectional guided resonances, and both longitudinal and transverse orbital angular momentum. Our work paves the way for multidimensional control over polarization and phase, inspiring new directions in ultrafast optics, optoelectronics, and quantum optics.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss
Authors:
Site Mo,
Haoxin Wang,
Bixiong Li,
Songhai Fan,
Yuankai Wu,
Xianggen Liu
Abstract:
Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as…
▽ More
Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder
Authors:
Tao Liu,
Chenpeng Du,
Shuai Fan,
Feilong Chen,
Kai Yu
Abstract:
Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we pr…
▽ More
Generating high-quality and person-generic visual dubbing remains a challenge. Recent innovation has seen the advent of a two-stage paradigm, decoupling the rendering and lip synchronization process facilitated by intermediate representation as a conduit. Still, previous methodologies rely on rough landmarks or are confined to a single speaker, thus limiting their performance. In this paper, we propose DiffDub: Diffusion-based dubbing. We first craft the Diffusion auto-encoder by an inpainting renderer incorporating a mask to delineate editable zones and unaltered regions. This allows for seamless filling of the lower-face region while preserving the remaining parts. Throughout our experiments, we encountered several challenges. Primarily, the semantic encoder lacks robustness, constricting its ability to capture high-level features. Besides, the modeling ignored facial positioning, causing mouth or nose jitters across frames. To tackle these issues, we employ versatile strategies, including data augmentation and supplementary eye guidance. Moreover, we encapsulated a conformer-based reference encoder and motion generator fortified by a cross-attention mechanism. This enables our model to learn person-specific textures with varying references and reduces reliance on paired audio-visual data. Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.
△ Less
Submitted 12 January, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Winding number criterion for the origin to belong to the numerical range of a matrix on a loop of matrices
Authors:
Cheng Guo,
Shanhui Fan
Abstract:
Let $A:[0,1]\to GL(n,\mathbb{C})$ be continuous with $A(0)=A(1)$, thus the winding number of $\det A$ is well-defined. If the winding number is not divisible by $n$, then the origin belongs to the numerical range of $A(φ)$ for some $φ\in [0,1]$.
Let $A:[0,1]\to GL(n,\mathbb{C})$ be continuous with $A(0)=A(1)$, thus the winding number of $\det A$ is well-defined. If the winding number is not divisible by $n$, then the origin belongs to the numerical range of $A(φ)$ for some $φ\in [0,1]$.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Learning Cooperative Trajectory Representations for Motion Forecasting
Authors:
Hongzhi Ruan,
Haibao Yu,
Wenxian Yang,
Siqi Fan,
Yingjuan Tang,
Zaiqing Nie
Abstract:
Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction infor…
▽ More
Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction information of traffic participants observed from cooperative devices. In this paper, we first propose the cooperative trajectory representations learning paradigm. Specifically, we present V2X-Graph, the first interpretable and end-to-end learning framework for cooperative motion forecasting. V2X-Graph employs an interpretable graph to fully leverage the cooperative motion and interaction contexts. Experimental results on the vehicle-to-infrastructure (V2I) motion forecasting dataset, V2X-Seq, demonstrate the effectiveness of V2X-Graph. To further evaluate on V2X scenario, we construct the first real-world vehicle-to-everything (V2X) motion forecasting dataset V2X-Traj, and the performance shows the advantage of our method. We hope both V2X-Graph and V2X-Traj can facilitate the further development of cooperative motion forecasting. Find project at https://github.com/AIR-THU/V2X-Graph, find data at https://github.com/AIR-THU/DAIR-V2X-Seq.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Stochastic modeling of superconducting qudits in the dispersive regime
Authors:
Kangdi Yu,
Murat C. Sarihan,
** Ho Kang,
Madeline Taylor,
Cody S. Fan,
Ananyo Banerjee,
Jonathan L. DuBois,
Yaniv J. Rosen,
Chee Wei Wong
Abstract:
The field of superconducting quantum computing, based on Josephson junctions, has recently seen remarkable strides in scaling the number of logical qubits. In particular, the fidelities of one- and two-qubit gates have reached the breakeven point with the novel error mitigation and correction methods. Parallel to these advances is the effort to expand the Hilbert space within a single junction or…
▽ More
The field of superconducting quantum computing, based on Josephson junctions, has recently seen remarkable strides in scaling the number of logical qubits. In particular, the fidelities of one- and two-qubit gates have reached the breakeven point with the novel error mitigation and correction methods. Parallel to these advances is the effort to expand the Hilbert space within a single junction or device by employing high-dimensional qubits, otherwise known as qudits. Research has demonstrated the possibility of driving higher-order transitions in a transmon or designing innovative multimode superconducting circuits, termed multimons. These advances can significantly expand the computational basis while simplifying the interconnects in a large-scale quantum processor. In this work we extend the measurement theory of a conventional superconducting qubit to that of a qudit, focusing on modeling the dispersive quadrature measurement in an open quantum system. Under the Markov assumption, the qudit Lindblad and stochastic master equations are formulated and analyzed; in addition, both the ensemble-averaged and the quantum-jump approach of decoherence analysis are detailed with analytical and numerical comparisons. We verify our stochastic model with a series of experimental results on a transmon-type qutrit, verifying the validity of our high-dimensional formalism.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
DoGE: Domain Reweighting with Generalization Estimation
Authors:
Simin Fan,
Matteo Pagliardini,
Martin Jaggi
Abstract:
The coverage and composition of the pretraining data significantly impacts the generalization ability of Large Language Models (LLMs). Despite its importance, recent LLMs still rely on heuristics and trial and error to increase or reduce the influence of data-domains. We propose DOmain reweighting with Generalization Estimation (DoGE), which optimizes the probability of sampling from each domain (…
▽ More
The coverage and composition of the pretraining data significantly impacts the generalization ability of Large Language Models (LLMs). Despite its importance, recent LLMs still rely on heuristics and trial and error to increase or reduce the influence of data-domains. We propose DOmain reweighting with Generalization Estimation (DoGE), which optimizes the probability of sampling from each domain (domain weights) in a principled way. Our approach is a two-stage process consisting of (i) training a proxy model to obtain domain weights using a bi-level optimization algorithm; (ii) training a larger base model by sampling training domains according to the learned domain weights. In our experiments, we extensively show how DoGE improves the generalization of the base model to any target data mixture. On the SlimPajama dataset, our base model gets better perplexity and few-shot reasoning accuracies across $6$ tasks compared to baseline methods. Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.
△ Less
Submitted 5 February, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Irreducible Curriculum for Language Model Pretraining
Authors:
Simin Fan,
Martin Jaggi
Abstract:
Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large langu…
▽ More
Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Accelerate Microstructure Evolution Simulation Using Graph Neural Networks with Adaptive Spatiotemporal Resolution
Authors:
Shaoxun Fan,
Andrew L. Hitt,
Ming Tang,
Babak Sadigh,
Fei Zhou
Abstract:
Surrogate models driven by sizeable datasets and scientific machine-learning methods have emerged as an attractive microstructure simulation tool with the potential to deliver predictive microstructure evolution dynamics with huge savings in computational costs. Taking 2D and 3D grain growth simulations as an example, we present a completely overhauled computational framework based on graph neural…
▽ More
Surrogate models driven by sizeable datasets and scientific machine-learning methods have emerged as an attractive microstructure simulation tool with the potential to deliver predictive microstructure evolution dynamics with huge savings in computational costs. Taking 2D and 3D grain growth simulations as an example, we present a completely overhauled computational framework based on graph neural networks with not only excellent agreement to both the ground truth phase-field methods and theoretical predictions, but enhanced accuracy and efficiency compared to previous works based on convolutional neural networks. These improvements can be attributed to the graph representation, both improved predictive power and a more flexible data structure amenable to adaptive mesh refinement. As the simulated microstructures coarsen, our method can adaptively adopt remeshed grids and larger timesteps to achieve further speedup. The data-to-model pipeline with training procedures together with the source codes are provided.
△ Less
Submitted 19 January, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Time-modulated near-field radiative heat transfer
Authors:
Renwen Yu,
Shanhui Fan
Abstract:
We explore near-field radiative heat transfer between two bodies under time modulation by develo** a rigorous fluctuational electrodynamics formalism. We demonstrate that time modulation can results in the enhancement, suppression, elimination, or reversal of radiative heat flow between the two bodies, and can be used to create a radiative thermal diode with infinite contrast ratio, as well as a…
▽ More
We explore near-field radiative heat transfer between two bodies under time modulation by develo** a rigorous fluctuational electrodynamics formalism. We demonstrate that time modulation can results in the enhancement, suppression, elimination, or reversal of radiative heat flow between the two bodies, and can be used to create a radiative thermal diode with infinite contrast ratio, as well as a near-field radiative heat engine that pumps heat from the cold to the hot bodies. The formalism reveals a fundamental symmetry relation in the radiative heat transfer coefficients that underlies these effects. Our results indicate the significant capabilities of time modulation for managing nanoscale heat flow.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors:
Jianing Qiu,
Jian Wu,
Hao Wei,
Peilun Shi,
Minqing Zhang,
Yunyun Sun,
Lin Li,
Hanruo Liu,
Hongyi Liu,
Simeng Hou,
Yuyang Zhao,
Xuehui Shi,
Junfang Xian,
Xiaoxia Qu,
Sirui Zhu,
Lijie Pan,
Xiaoniao Chen,
Xiaojia Zhang,
Shuai Jiang,
Kebing Wang,
Chenlong Yang,
Mingqiang Chen,
Sujie Fan,
Jianhua Hu,
Aiguo Lv
, et al. (17 additional authors not shown)
Abstract:
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi…
▽ More
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Mesoscopic non-Hermitian skin effect
Authors:
Alexander Poddubny,
Janet Zhong,
Shanhui Fan
Abstract:
We discuss a generalization of the non-Hermitian skin effect to finite-size photonic structures with
neither gain nor loss in the bulk and purely real energy spectrum under periodic boundary conditions (PBC).
We show that such systems can still have significant portions of eigenmodes concentrated at the edges and that this edge concentration can be linked to the non-trivial point-gap topology…
▽ More
We discuss a generalization of the non-Hermitian skin effect to finite-size photonic structures with
neither gain nor loss in the bulk and purely real energy spectrum under periodic boundary conditions (PBC).
We show that such systems can still have significant portions of eigenmodes concentrated at the edges and that this edge concentration can be linked to the non-trivial point-gap topology of the size-dependent regularized PBC spectrum, accounting for the radiative losses. As an example, we consider the chiral waveguide quantum electrodynamics platform with an array of atoms coupled to the waveguide. The proposed mesoscopic analogue of the non-Hermitian skin effect could be potentially applied to other seemingly lossless photonic structures, such as chiral resonant all-dielectric metamaterials.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Unconventional superconductivity in Sc$_2$Ir$_{4-x}$Si$_x$ by spin-orbit coupling driven flat band
Authors:
Zhengyan Zhu,
Yuxiang Wu,
Shengtai Fan,
Yiliang Fan,
Yiwen Li,
Yongze Ye,
Xiyu Zhu,
Haijun Zhang,
Hai-Hu Wen
Abstract:
The kagome lattice is very attractive as it can host many novel quantum states, such as the charge density wave, superconductivity, quantum spin liquid, etc. Meanwhile, iridates often exhibit a strong spin-orbit coupling (SOC) effect due to the large atomic mass of 5$d$ elements, which has important implications for both the energy bands and the pairing symmetry of superconductors. For the Laves p…
▽ More
The kagome lattice is very attractive as it can host many novel quantum states, such as the charge density wave, superconductivity, quantum spin liquid, etc. Meanwhile, iridates often exhibit a strong spin-orbit coupling (SOC) effect due to the large atomic mass of 5$d$ elements, which has important implications for both the energy bands and the pairing symmetry of superconductors. For the Laves phase superconductor Sc$_2$Ir$_4$ with a kagome lattice, by do** Si to the Ir sites, we observed a nonmonotonic and two-dome like do** dependence of the superconducting transition temperature $T_{\rm c}$, which is typically found in many unconventional superconducting systems. Interestingly, for some samples, especially Sc$_2$Ir$_{3.5}$Si$_{0.5}$ with the optimal $T_{\rm c}$, after the suppression of superconductivity, the normal-state resistivity exhibits a semiconducting behavior; meanwhile, the specific heat coefficient shows an upturn which follows the relation $C/T\propto{\rm ln}(T_0/T)$ at low temperatures. Around the optimal do**, the resistance measurements exhibit strong superconducting fluctuations. And the superconductivity related specific heat can be fitted by the model of a $d$-wave gap after subtracting the normal-state background. These strongly suggest unconventional superconductivity and correlation effect in the samples, which is mainly induced by a flat band near the Fermi level when considering the SOC, as supported by the first-principles calculations. Our results reveal a new unconventional superconducting system Sc$_2$Ir$_{4-x}$Si$_x$ with strong correlation effects induced by the flat band in the kagome system with strong SOC.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Missing-modality Enabled Multi-modal Fusion Architecture for Medical Data
Authors:
Muyu Wang,
Shiyu Fan,
Yichen Li,
Hui Chen
Abstract:
Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data…
▽ More
Fusing multi-modal data can improve the performance of deep learning models. However, missing modalities are common for medical data due to patients' specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities. This study aimed to develop an efficient multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance on disease diagnosis.X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve model's robustness to missing modalities in the inference process. Finally, we designed comparison and ablation experiments for validating the effectiveness of the fusion, the robustness to missing modalities and the enhancements from each key component. Experiments were conducted on MIMIC-IV, MIMIC-CXR with the 14-label disease diagnosis task. Areas under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC) were used to evaluate models' performance. The experimental results demonstrated that our proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This method is hopeful to be scaled to more modalities to enhance the clinical practicality of the model.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Diminishing Mott gap by do** electrons through depositing one monolayer thin film of Rb on Ca$_{2}$CuO$_{2}$Cl$_{2}$
Authors:
Han Li,
Zhaohui Wang,
Shengtai Fan,
Huazhou Li,
Huan Yang,
Hai-Hu Wen
Abstract:
Understanding the do** evolution from a Mott insulator to a superconductor probably holds the key for resolving the mystery of unconventional superconductivity in copper oxides. To elucidate the evolution of the electronic state starting from the Mott insulator, we dose the surface of the parent phase Ca$_{2}$CuO$_{2}$Cl$_{2}$ by depositing one monolayer thin film of Rb atoms which are supposed…
▽ More
Understanding the do** evolution from a Mott insulator to a superconductor probably holds the key for resolving the mystery of unconventional superconductivity in copper oxides. To elucidate the evolution of the electronic state starting from the Mott insulator, we dose the surface of the parent phase Ca$_{2}$CuO$_{2}$Cl$_{2}$ by depositing one monolayer thin film of Rb atoms which are supposed to donate electrons to the CuO$_{2}$ planes underneath. We successfully achieved the Rb thin films with periodic structures, and the scanning tunneling microscopy or spectroscopy (STM or STS) measurements on the surface show that the Fermi energy is pinned within the Mott gap but more close to the edge of the charge transfer band (CTB). However, the electron do** does not reduce the spectra weight of the upper Hubbard band (UHB) for the double occupancy as expected from the rigid model, but instead increase it; meanwhile, further do** will create a new wide spread in gap states derivative from the UHB, and the Mott gap will be significantly diminished. Our results provide new clues to understand the strong correlation effect of parent Mott insulators for cuprates and shed new light on the origin of high-temperature superconductivity.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
A user's guide to 1D nonlinear backward stochastic differential equations with applications and open problems
Authors:
Shengjun Fan,
Ying Hu,
Shanjian Tang
Abstract:
We present a comprehensive theory on the well-posedness of a one-dimensional nonlinear backward stochastic differential equation (1D BSDE for short), where the generator $g$ has a one-sided linear/super-linear growth in the first unknown variable $y$ and an at most quadratic growth in the second unknown variable $z$. We first establish several existence theorems and comparison theorems with the te…
▽ More
We present a comprehensive theory on the well-posedness of a one-dimensional nonlinear backward stochastic differential equation (1D BSDE for short), where the generator $g$ has a one-sided linear/super-linear growth in the first unknown variable $y$ and an at most quadratic growth in the second unknown variable $z$. We first establish several existence theorems and comparison theorems with the test function method and the a priori estimate technique, and then immediately give several existence and uniqueness results. We also overview relevant known results and introduce some practical applications of our theoretical results. Finally, we list some open problems on the well-posedness of 1D BSDEs.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
FLM-101B: An Open LLM and How to Train It with $100K Budget
Authors:
Xiang Li,
Yiqun Yao,
Xin Jiang,
Xuezhi Fang,
Xuying Meng,
Siqi Fan,
Peng Han,
**g Li,
Li Du,
Bowen Qin,
Zheng Zhang,
Aixin Sun,
Yequan Wang
Abstract:
Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in develo** LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.…
▽ More
Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in develo** LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic map**, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.
△ Less
Submitted 17 September, 2023; v1 submitted 7 September, 2023;
originally announced September 2023.
-
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Authors:
Zhenyu Li,
Sunqi Fan,
Yu Gu,
Xiuxing Li,
Zhichao Duan,
Bowen Dong,
Ning Liu,
Jianyong Wang
Abstract:
Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual…
▽ More
Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual annotation, we introduce FlexKBQA by utilizing Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base, which are subsequently converted into natural language questions via LLMs. This synthetic dataset facilitates training a specialized lightweight model for the KB. Additionally, to reduce the barriers of distribution shift between synthetic data and real user questions, FlexKBQA introduces an executionguided self-training method to iterative leverage unlabeled user questions. Furthermore, we explore harnessing the inherent reasoning capability of LLMs to enhance the entire framework. Consequently, FlexKBQA delivers substantial flexibility, encompassing data annotation, deployment, and being domain agnostic. Through extensive experiments on GrailQA, WebQSP, and KQA Pro, we observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations, surpassing all previous baselines and even approaching the performance of supervised models, achieving a remarkable 93% performance relative to the fully-supervised models. We posit that FlexKBQA represents a significant advancement towards exploring better integration of large and lightweight models. The code is open-sourced.
△ Less
Submitted 26 January, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Authors:
Yizhen Luo,
Jiahuan Zhang,
Siqi Fan,
Kai Yang,
Yushuai Wu,
Mu Qiao,
Zaiqing Nie
Abstract:
Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cel…
▽ More
Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cells, which are encoded by the language of life and exhibit significant modality gaps with human natural language. In this paper, we introduce BioMedGPT, an open multimodal generative pre-trained transformer (GPT) for biomedicine, to bridge the gap between the language of life and human natural language. BioMedGPT allows users to easily ``communicate'' with diverse biological modalities through free text, which is the first of its kind. BioMedGPT aligns different biological modalities with natural language via a large generative language model, namely, BioMedGPT-LM. We publish BioMedGPT-10B, which unifies the feature spaces of molecules, proteins, and natural language via encoding and alignment. Through fine-tuning, BioMedGPT-10B outperforms or is on par with human and significantly larger general-purpose foundation models on the biomedical QA task. It also demonstrates promising performance in the molecule QA and protein QA tasks, which could greatly accelerate the discovery of new drugs and therapeutic targets. In addition, BioMedGPT-LM-7B is the first large generative language model based on Llama2 in the biomedical domain, therefore is commercial friendly. Both BioMedGPT-10B and BioMedGPT-LM-7B are open-sourced to the research community. In addition, we publish the datasets that are meticulously curated for the alignment of multi-modalities, i.e., PubChemQA and UniProtQA. All the models, codes, and datasets are available at \url{https://github.com/PharMolix/OpenBioMed}.
△ Less
Submitted 21 August, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Screen-based 3D Subjective Experiment Software
Authors:
Songlin Fan,
Wei Gao
Abstract:
Recently, widespread 3D graphics (e.g., point clouds and meshes) have drawn considerable efforts from academia and industry to assess their perceptual quality by conducting subjective experiments. However, lacking a handy software for 3D subjective experiments complicates the construction of 3D graphics quality assessment datasets, thus hindering the prosperity of relevant fields. In this paper, w…
▽ More
Recently, widespread 3D graphics (e.g., point clouds and meshes) have drawn considerable efforts from academia and industry to assess their perceptual quality by conducting subjective experiments. However, lacking a handy software for 3D subjective experiments complicates the construction of 3D graphics quality assessment datasets, thus hindering the prosperity of relevant fields. In this paper, we develop a powerful platform with which users can flexibly design their 3D subjective methodologies and build high-quality datasets, easing a broad spectrum of 3D graphics subjective quality study. To accurately illustrate the perceptual quality differences of 3D stimuli, our software can simultaneously render the source stimulus and impaired stimulus and allows both stimuli to respond synchronously to viewer interactions. Compared with amateur 3D visualization tool-based or image/video rendering-based schemes, our approach embodies typical 3D applications while minimizing cognitive overload during subjective experiments. We organized a subjective experiment involving 40 participants to verify the validity of the proposed software. Experimental analyses demonstrate that subjective tests on our software can produce reasonable subjective quality scores of 3D models. All resources in this paper can be found at https://openi.pcl.ac.cn/OpenDatasets/3DQA.
△ Less
Submitted 7 August, 2023; v1 submitted 7 August, 2023;
originally announced August 2023.
-
QUEST: Query Stream for Practical Cooperative Perception
Authors:
Siqi Fan,
Haibao Yu,
Wenxian Yang,
Jirui Yuan,
Zaiqing Nie
Abstract:
Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifica…
▽ More
Cooperative perception can effectively enhance individual perception performance by providing additional viewpoint and expanding the sensing field. Existing cooperation paradigms are either interpretable (result cooperation) or flexible (feature cooperation). In this paper, we propose the concept of query cooperation to enable interpretable instance-level flexible feature interaction. To specifically explain the concept, we propose a cooperative perception framework, termed QUEST, which let query stream flow among agents. The cross-agent queries are interacted via fusion for co-aware instances and complementation for individual unaware instances. Taking camera-based vehicle-infrastructure perception as a typical practical application scene, the experimental results on the real-world dataset, DAIR-V2X-Seq, demonstrate the effectiveness of QUEST and further reveal the advantage of the query cooperation paradigm on transmission flexibility and robustness to packet dropout. We hope our work can further facilitate the cross-agent representation interaction for better cooperative perception in practice.
△ Less
Submitted 22 May, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Eigenenergy braids in 2D photonic crystals
Authors:
Janet Zhong,
Charles C. Wojcik,
Dali Cheng,
Shanhui Fan
Abstract:
We consider non-Hermitian energy band theory in two-dimensional systems, and study eigenenergy braids on slices in the two-dimensional Brillouin zone. We show the consequences of reciprocity and geometric symmetry on such eigenenergy braids. The point-gap topology of the energy bands can be found from the projection of the eigenenergy braid onto the complex energy plane. We show that the conjugacy…
▽ More
We consider non-Hermitian energy band theory in two-dimensional systems, and study eigenenergy braids on slices in the two-dimensional Brillouin zone. We show the consequences of reciprocity and geometric symmetry on such eigenenergy braids. The point-gap topology of the energy bands can be found from the projection of the eigenenergy braid onto the complex energy plane. We show that the conjugacy class transitions in the eigenenergy braid results in the changes in the number of bands in a complete point-gap loop. This transition occurs at exceptional points. We numerically demonstrate these concepts using two-dimensional reciprocal and nonreciprocal photonic crystals.
△ Less
Submitted 5 November, 2023; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Scalar BSDEs of iterated-logarithmically sublinear generators with integrable terminal values
Authors:
Shengjun Fan,
Ying Hu,
Shanjian Tang
Abstract:
We establish a general existence and uniqueness of integrable adapted solutions to scalar backward stochastic differential equations with integrable parameters, where the generator $g$ has an iterated-logarithmic uniform continuity in the second unknown variable $z$. The result improves our previous one in \cite{FanHuTang2023SCL}.
We establish a general existence and uniqueness of integrable adapted solutions to scalar backward stochastic differential equations with integrable parameters, where the generator $g$ has an iterated-logarithmic uniform continuity in the second unknown variable $z$. The result improves our previous one in \cite{FanHuTang2023SCL}.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering
Authors:
Wei Cheng,
Ruixiang Chen,
Wanqi Yin,
Siming Fan,
Keyu Chen,
Honglin He,
Huiwen Luo,
Zhongang Cai,
**gbo Wang,
Yang Gao,
Zhengming Yu,
Zhengyu Lin,
Daxuan Ren,
Lei Yang,
Ziwei Liu,
Chen Change Loy,
Chen Qian,
Wayne Wu,
Dahua Lin,
Bo Dai,
Kwan-Yee Lin
Abstract:
Realistic human-centric rendering plays a key role in both computer vision and computer graphics. Rapid progress has been made in the algorithm aspect over the years, yet existing human-centric rendering datasets and benchmarks are rather impoverished in terms of diversity, which are crucial for rendering effect. Researchers are usually constrained to explore and evaluate a small set of rendering…
▽ More
Realistic human-centric rendering plays a key role in both computer vision and computer graphics. Rapid progress has been made in the algorithm aspect over the years, yet existing human-centric rendering datasets and benchmarks are rather impoverished in terms of diversity, which are crucial for rendering effect. Researchers are usually constrained to explore and evaluate a small set of rendering problems on current datasets, while real-world applications require methods to be robust across different scenarios. In this work, we present DNA-Rendering, a large-scale, high-fidelity repository of human performance data for neural actor rendering. DNA-Rendering presents several alluring attributes. First, our dataset contains over 1500 human subjects, 5000 motion sequences, and 67.5M frames' data volume. Second, we provide rich assets for each subject -- 2D/3D human body keypoints, foreground masks, SMPLX models, cloth/accessory materials, multi-view images, and videos. These assets boost the current method's accuracy on downstream rendering tasks. Third, we construct a professional multi-view system to capture data, which contains 60 synchronous cameras with max 4096 x 3000 resolution, 15 fps speed, and stern camera calibration steps, ensuring high-quality resources for task training and evaluation. Along with the dataset, we provide a large-scale and quantitative benchmark in full-scale, with multiple tasks to evaluate the existing progress of novel view synthesis, novel pose animation synthesis, and novel identity rendering methods. In this manuscript, we describe our DNA-Rendering effort as a revealing of new observations, challenges, and future directions to human-centric rendering. The dataset, code, and benchmarks will be publicly available at https://dna-rendering.github.io/
△ Less
Submitted 30 September, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Optical Tellegen metamaterial with spontaneous magnetization
Authors:
S. S. Jazi,
I. Faniayeu,
R. Cichelero,
D. C. Tzarouchis,
M. M. Asgari,
A. Dmitriev,
S. Fan,
V. Asadchy
Abstract:
The nonreciprocal magnetoelectric effect, also known as the Tellegen effect, promises a number of groundbreaking phenomena connected to fundamental (e.g., electrodynamics of axion and relativistic matter) and applied physics (e.g., magnetless isolators). We propose a three-dimensional metamaterial with an isotropic and resonant Tellegen response in the visible frequency range. The metamaterial is…
▽ More
The nonreciprocal magnetoelectric effect, also known as the Tellegen effect, promises a number of groundbreaking phenomena connected to fundamental (e.g., electrodynamics of axion and relativistic matter) and applied physics (e.g., magnetless isolators). We propose a three-dimensional metamaterial with an isotropic and resonant Tellegen response in the visible frequency range. The metamaterial is formed by randomly oriented bi-material nanocylinders in a host medium. Each nanocylinder consists of a ferromagnet in a single-domain magnetic state and a high-permittivity dielectric operating near the magnetic Mie-type resonance. The proposed metamaterial requires no external magnetic bias and operates on the spontaneous magnetization of the nanocylinders. By leveraging the emerging magnetic Weyl semimetals, we further show how a giant bulk effective magnetoelectric effect can be achieved in a proposed metamaterial, exceeding that of natural materials by almost four orders of magnitude.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Variational Probabilistic Fusion Network for RGB-T Semantic Segmentation
Authors:
Baihong Lin,
Zengrong Lin,
Yulan Guo,
Yulan Zhang,
Jianxiao Zou,
Shicai Fan
Abstract:
RGB-T semantic segmentation has been widely adopted to handle hard scenes with poor lighting conditions by fusing different modality features of RGB and thermal images. Existing methods try to find an optimal fusion feature for segmentation, resulting in sensitivity to modality noise, class-imbalance, and modality bias. To overcome the problems, this paper proposes a novel Variational Probabilisti…
▽ More
RGB-T semantic segmentation has been widely adopted to handle hard scenes with poor lighting conditions by fusing different modality features of RGB and thermal images. Existing methods try to find an optimal fusion feature for segmentation, resulting in sensitivity to modality noise, class-imbalance, and modality bias. To overcome the problems, this paper proposes a novel Variational Probabilistic Fusion Network (VPFNet), which regards fusion features as random variables and obtains robust segmentation by averaging segmentation results under multiple samples of fusion features. The random samples generation of fusion features in VPFNet is realized by a novel Variational Feature Fusion Module (VFFM) designed based on variation attention. To further avoid class-imbalance and modality bias, we employ the weighted cross-entropy loss and introduce prior information of illumination and category to control the proposed VFFM. Experimental results on MFNet and PST900 datasets demonstrate that the proposed VPFNet can achieve state-of-the-art segmentation performance.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Weighted Erdős-Kac Theorems via Computing Moments
Authors:
Steve Fan
Abstract:
By adapting the moment method developed by Granville and Soundararajan [GS07], Khan, Milinovich and Subedi [KMS22] recently obtained a weighted version of the Erdős-Kac theorem for $ω(n)$ with multiplicative weight $d_k(n)$, where $ω(n)$ denotes the number of distinct prime divisors of a positive integer $n$, and $d_k(n)$ is the $k$-fold divisor function with $k\in\mathbb{N}$. In this paper, we ge…
▽ More
By adapting the moment method developed by Granville and Soundararajan [GS07], Khan, Milinovich and Subedi [KMS22] recently obtained a weighted version of the Erdős-Kac theorem for $ω(n)$ with multiplicative weight $d_k(n)$, where $ω(n)$ denotes the number of distinct prime divisors of a positive integer $n$, and $d_k(n)$ is the $k$-fold divisor function with $k\in\mathbb{N}$. In this paper, we generalize their method to study the distribution of additive functions $f(n)$ weighted by nonnegative multiplicative functions $α(n)$ in a wide class. In particular, we establish uniform asymptotic formulas for the moments of $f(n)$ with suitable growth rates. We also prove a qualitative result on the moments which extends a theorem of Delange and Halberstam [DH57]. As a consequence, we obtain a weighted analogue of the Kubilius-Shapiro theorem.
△ Less
Submitted 3 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Topological nature of non-Hermitian degenerate bands in structural parameter space
Authors:
Olivia Y. Long,
Cheng Guo,
Shanhui Fan
Abstract:
In photonics, band degeneracies at high-symmetry points in wavevector space have been shown to exhibit rich physical phenomena. However, obtaining degenerate bands away from such points is highly nontrivial. In this work, we achieve complex band degeneracy in a photonic crystal structure over a region of momentum space. We show that this band degeneracy corresponds to polarization-independent tran…
▽ More
In photonics, band degeneracies at high-symmetry points in wavevector space have been shown to exhibit rich physical phenomena. However, obtaining degenerate bands away from such points is highly nontrivial. In this work, we achieve complex band degeneracy in a photonic crystal structure over a region of momentum space. We show that this band degeneracy corresponds to polarization-independent transmission, which can be harnessed for nonlocal metasurface design. Moreover, we find that the band degeneracy manifests as a topological singularity in the structural parameter space of the system. Our work highlights the importance of topological concepts in the design of polarization-independent photonic structures.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Distributed Flocking Control of Aerial Vehicles Based on a Markov Random Field
Authors:
Guobin Zhu,
Shanwei Fan,
Qingrui Zhang
Abstract:
The distributed flocking control of collective aerial vehicles has extraordinary advantages in scalability and reliability, \emph{etc.} However, it is still challenging to design a reliable, efficient, and responsive flocking algorithm. In this paper, a distributed predictive flocking framework is presented based on a Markov random field (MRF). The MRF is used to characterize the optimization prob…
▽ More
The distributed flocking control of collective aerial vehicles has extraordinary advantages in scalability and reliability, \emph{etc.} However, it is still challenging to design a reliable, efficient, and responsive flocking algorithm. In this paper, a distributed predictive flocking framework is presented based on a Markov random field (MRF). The MRF is used to characterize the optimization problem that is eventually resolved by discretizing the input space. Potential functions are employed to describe the interactions between aerial vehicles and as indicators of flight performance. The dynamic constraints are taken into account in the candidate feasible trajectories which correspond to random variables. Numerical simulation shows that compared with some existing latest methods, the proposed algorithm has better-flocking cohesion and control efficiency performances. Experiments are also conducted to demonstrate the feasibility of the proposed algorithm.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
Numerically explicit estimates for the distribution of rough numbers
Authors:
Steve Fan
Abstract:
For $x\ge y>1$ and $u:= \log x/\log y$, let $Φ(x,y)$ denote the number of positive integers up to $x$ free of prime divisors less than or equal to $y$. In 1950 de Bruijn [1] studied the approximation of $Φ(x,y)$ by the quantity \[μ_y(u)e^γx\log y\prod_{p\leq y}\left(1-\frac{1}{p}\right),\] where $γ=0.5772156...$ is Euler's constant and \[μ_y(u):=\int_{1}^{u}y^{t-u}ω(t)\,dt.\] He showed that the as…
▽ More
For $x\ge y>1$ and $u:= \log x/\log y$, let $Φ(x,y)$ denote the number of positive integers up to $x$ free of prime divisors less than or equal to $y$. In 1950 de Bruijn [1] studied the approximation of $Φ(x,y)$ by the quantity \[μ_y(u)e^γx\log y\prod_{p\leq y}\left(1-\frac{1}{p}\right),\] where $γ=0.5772156...$ is Euler's constant and \[μ_y(u):=\int_{1}^{u}y^{t-u}ω(t)\,dt.\] He showed that the asymptotic formula \[Φ(x,y)=μ_y(u)e^γx\log y\prod_{p\leq y}\left(1-\frac{1}{p}\right)+O\left(\frac{xR(y)}{\log y}\right)\] holds uniformly for all $x\ge y\ge2$, where $R(y)$ is a positive decreasing function related to the error estimates in the Prime Number Theorem. In this paper we obtain numerically explicit versions of de Bruijn's result.
△ Less
Submitted 3 October, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
An inequality related to the sieve of Eratosthenes
Authors:
Steve Fan,
Carl Pomerance
Abstract:
Let $Φ(x,y)$ denote the number of integers $n\in[1,x]$ free of prime factors $\le y$. We show that but for a few small cases, $Φ(x,y)<.6x/\log y$ when $y\le\sqrt{x}$.
Let $Φ(x,y)$ denote the number of integers $n\in[1,x]$ free of prime factors $\le y$. We show that but for a few small cases, $Φ(x,y)<.6x/\log y$ when $y\le\sqrt{x}$.
△ Less
Submitted 12 August, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
VO2 Phase Change Electrodes in Li-ion Batteries
Authors:
Samuel Castro-Pardo,
Anand B. Puthirath,
Shaoxun Fan,
Sreehari Saju,
Guang Yang,
Jagjit Nanda,
Robert Vajtai,
Ming Tang,
Pulickel M. Ajayan
Abstract:
Use of electrode materials that show phase change behavior and hence drastic changes in electrochemical activity during operation, have not been explored for Li-ion batteries. Here we demonstrate the vanadium oxide (VO2) cathode that undergoes metal-insulator transition due to first-order structural phase transition at accessible temperature of 68°C for battery operation. Using a suitable electrol…
▽ More
Use of electrode materials that show phase change behavior and hence drastic changes in electrochemical activity during operation, have not been explored for Li-ion batteries. Here we demonstrate the vanadium oxide (VO2) cathode that undergoes metal-insulator transition due to first-order structural phase transition at accessible temperature of 68°C for battery operation. Using a suitable electrolyte operable across the phase transition range and compatible with vanadium oxide cathodes, we studied the effect of electrode structure change on lithium insertion followed by the electrochemical characteristics above and below the phase transition temperature. The high-temperature VO2 phase shows significantly improved capacitance, enhanced current rate capabilities, improved electrical conductivity and lithium-ion diffusivity compared to the insulating low temperature phase. This opens up new avenues for electrode designs, allowing manipulation of electrochemical reactions around phase transition temperatures, and in particular enhancing electrochemical properties at elevated temperatures contrary to existing classes of battery chemistries that lead to performance deterioration at elevated temperatures.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars
Authors:
Dongwei Pan,
Long Zhuo,
**gtan Piao,
Huiwen Luo,
Wei Cheng,
Yuxin Wang,
Siming Fan,
Shengqi Liu,
Lei Yang,
Bo Dai,
Ziwei Liu,
Chen Change Loy,
Chen Qian,
Wayne Wu,
Dahua Lin,
Kwan-Yee Lin
Abstract:
Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios. One of the vital causes is inadequate datasets -- 1) current public datasets can only support researchers to explore high-fidelity head avatars in one or two task directions; 2)…
▽ More
Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios. One of the vital causes is inadequate datasets -- 1) current public datasets can only support researchers to explore high-fidelity head avatars in one or two task directions; 2) these datasets usually contain digital head assets with limited data volume, and narrow distribution over different attributes. In this paper, we present RenderMe-360, a comprehensive 4D human head dataset to drive advance in head avatar research. It contains massive data assets, with 243+ million complete head frames, and over 800k video sequences from 500 different identities captured by synchronized multi-view cameras at 30 FPS. It is a large-scale digital library for head avatars with three key attributes: 1) High Fidelity: all subjects are captured by 60 synchronized, high-resolution 2K cameras in 360 degrees. 2) High Diversity: The collected subjects vary from different ages, eras, ethnicities, and cultures, providing abundant materials with distinctive styles in appearance and geometry. Moreover, each subject is asked to perform various motions, such as expressions and head rotations, which further extend the richness of assets. 3) Rich Annotations: we provide annotations with different granularities: cameras' parameters, matting, scan, 2D/3D facial landmarks, FLAME fitting, and text description.
Based on the dataset, we build a comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks: novel view synthesis, novel expression synthesis, hair rendering, hair editing, and talking head generation. Our experiments uncover the strengths and weaknesses of current methods. RenderMe-360 opens the door for future exploration in head avatars.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Tunable all-optical logic gates based on nonreciprocal topologically protected edge modes
Authors:
Jie Xu,
Panpan He,
Delong Feng,
Yamei Luo,
Siqiang Fan,
Kangle Yong,
Kosmas L. Tsakmakidis
Abstract:
All-optical logic gates have been studied intensively for their potential to enable broadband, low-loss, and high-speed communication. However, poor tunability has remained a key challenge in this field. In this paper, we propose a Y-shaped structure composed of Yttrium Iron Garnet (YIG) layers that can serve as tunable all-optical logic gates, including, but not limited to, OR, AND, and NOT gates…
▽ More
All-optical logic gates have been studied intensively for their potential to enable broadband, low-loss, and high-speed communication. However, poor tunability has remained a key challenge in this field. In this paper, we propose a Y-shaped structure composed of Yttrium Iron Garnet (YIG) layers that can serve as tunable all-optical logic gates, including, but not limited to, OR, AND, and NOT gates, by applying external magnetic fields to magnetize the YIG layers. Our findings demonstrate that these logic gates are based on topologically protected one-way edge modes, ensuring exceptional robustness against imperfections and nonlocal effects while maintaining extremely high precision. Furthermore, the operating band of the logic gates is shown to be tunable. In addition, we introduce a straightforward and practical method for controlling and switching the logic gates between "work", "skip", and "stop" modes. These findings have important implications for the design of high-performance and precise all-optical integrated circuits.
△ Less
Submitted 16 May, 2023;
originally announced May 2023.
-
A Comprehensive Picture of Factors Affecting User Willingness to Use Mobile Health Applications
Authors:
Shao**g Fan,
Ramesh C. Jain,
Mohan S. Kankanhalli
Abstract:
Mobile health (mHealth) applications have become increasingly valuable in preventive healthcare and in reducing the burden on healthcare organizations. The aim of this paper is to investigate the factors that influence user acceptance of mHealth apps and identify the underlying structure that shapes users' behavioral intention. An online study that employed factorial survey design with vignettes w…
▽ More
Mobile health (mHealth) applications have become increasingly valuable in preventive healthcare and in reducing the burden on healthcare organizations. The aim of this paper is to investigate the factors that influence user acceptance of mHealth apps and identify the underlying structure that shapes users' behavioral intention. An online study that employed factorial survey design with vignettes was conducted, and a total of 1,669 participants from eight countries across four continents were included in the study. Structural equation modeling was employed to quantitatively assess how various factors collectively contribute to users' willingness to use mHealth apps. The results indicate that users' digital literacy has the strongest impact on their willingness to use them, followed by their online habit of sharing personal information. Users' concerns about personal privacy only had a weak impact. Furthermore, users' demographic background, such as their country of residence, age, ethnicity, and education, has a significant moderating effect. Our findings have implications for app designers, healthcare practitioners, and policymakers. Efforts are needed to regulate data collection and sharing and promote digital literacy among the general population to facilitate the widespread adoption of mHealth apps.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.