-
How do the resting EEG preprocessing states affect the outcomes of postprocessing?
Authors:
Shiang Hu,
Jie Ruan,
Juan Hou,
Pedro Antonio Valdes-Sosa,
Zhao Lv
Abstract:
Plenty of artifact removal tools and pipelines have been developed to correct the EEG recordings and discover the values below the waveforms. Without visual inspection from the experts, it is susceptible to derive improper preprocessing states, like the insufficient preprocessed EEG (IPE), and the excessive preprocessed EEG (EPE). However, little is known about the impacts of IPE or EPE on the pos…
▽ More
Plenty of artifact removal tools and pipelines have been developed to correct the EEG recordings and discover the values below the waveforms. Without visual inspection from the experts, it is susceptible to derive improper preprocessing states, like the insufficient preprocessed EEG (IPE), and the excessive preprocessed EEG (EPE). However, little is known about the impacts of IPE or EPE on the postprocessing in the frequency, spatial and temporal domains, particularly as to the spectra and the functional connectivity (FC) analysis. Here, the clean EEG (CE) was synthesized as the ground truth based on the New-York head model and the multivariate autoregressive model. Later, the IPE and the EPE were simulated by injecting the Gaussian noise and losing the brain activities, respectively. Then, the impacts on postprocessing were quantified by the deviation caused by the IPE or EPE from the CE as to the 4 temporal statistics, the multichannel power, the cross spectra, the dispersion of source imaging, and the properties of scalp EEG network. Lastly, the association analysis was performed between the PaLOSi metric and the varying trends of postprocessing with the evolution of preprocessing states. This study shed light on how the postprocessing outcomes are affected by the preprocessing states and PaLOSi may be a potential effective quality metric.
△ Less
Submitted 12 December, 2023; v1 submitted 22 October, 2023;
originally announced October 2023.
-
Rational Q-systems at Root of Unity I. Closed Chains
Authors:
Jue Hou,
Yunfeng Jiang,
Yuan Miao
Abstract:
The solution of Bethe ansatz equations for XXZ spin chain with the parameter $q$ being a root of unity is infamously subtle. In this work, we develop the rational $Q$-system for this case, which offers a systematic way to find all physical solutions of the Bethe ansatz equations at root of unity. The construction contains two parts. In the first part, we impose additional constraints to the ration…
▽ More
The solution of Bethe ansatz equations for XXZ spin chain with the parameter $q$ being a root of unity is infamously subtle. In this work, we develop the rational $Q$-system for this case, which offers a systematic way to find all physical solutions of the Bethe ansatz equations at root of unity. The construction contains two parts. In the first part, we impose additional constraints to the rational $Q$-system. These constraints eliminate the so-called Fabricius-McCoy (FM) string solutions, yielding all primitive solutions. In the second part, we give a simple procedure to construct the descendant tower of any given primitive state. The primitive solutions together with their descendant towers constitute the complete Hilbert space. We test our proposal by extensive numerical checks and apply it to compute the torus partition function of the 6-vertex model at root of unity.
△ Less
Submitted 4 April, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Exploring hyperelastic material model discovery for human brain cortex: multivariate analysis vs. artificial neural network approaches
Authors:
Jixin Hou,
Nicholas Filla,
Xianyan Chen,
Mir Jalil Razavi,
Tianming Liu,
Xianqiao Wang
Abstract:
Traditional computational methods, such as the finite element analysis, have provided valuable insights into uncovering the underlying mechanisms of brain physical behaviors. However, precise predictions of brain physics require effective constitutive models to represent the intricate mechanical properties of brain tissue. In this study, we aimed to identify the most favorable constitutive materia…
▽ More
Traditional computational methods, such as the finite element analysis, have provided valuable insights into uncovering the underlying mechanisms of brain physical behaviors. However, precise predictions of brain physics require effective constitutive models to represent the intricate mechanical properties of brain tissue. In this study, we aimed to identify the most favorable constitutive material model for human brain tissue. To achieve this, we applied artificial neural network and multiple regression methods to a generalization of widely accepted classic models, and compared the results obtained from these two approaches. To evaluate the applicability and efficacy of the model, all setups were kept consistent across both methods, except for the approach to prevent potential overfitting. Our results demonstrate that artificial neural networks are capable of automatically identifying accurate constitutive models from given admissible estimators. Nonetheless, the five-term and two-term neural network models trained under single-mode and multi-mode loading scenarios, were found to be suboptimal and could be further simplified into two-term and single-term, respectively, with higher accuracy using multiple regression. Our findings highlight the importance of hyperparameters for the artificial neural network and emphasize the necessity for detailed cross-validations of regularization parameters to ensure optimal selection at a global level in the development of material constitutive models. This study validates the applicability and accuracy of artificial neural network to automatically discover constitutive material models with proper regularization as well as the benefits in model simplification without compromising accuracy for traditional multivariable regression.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Query-dominant User Interest Network for Large-Scale Search Ranking
Authors:
Tong Guo,
Xuan** Li,
Haitao Yang,
Xiao Liang,
Yong Yuan,
**gyou Hou,
Bingqing Ke,
Chao Zhang,
junlin He,
Shunyu Zhang,
Enyun Yu,
Wenwu
Abstract:
Historical behaviors have shown great effect and potential in various prediction tasks, including recommendation and information retrieval. The overall historical behaviors are various but noisy while search behaviors are always sparse. Most existing approaches in personalized search ranking adopt the sparse search behaviors to learn representation with bottleneck, which do not sufficiently exploi…
▽ More
Historical behaviors have shown great effect and potential in various prediction tasks, including recommendation and information retrieval. The overall historical behaviors are various but noisy while search behaviors are always sparse. Most existing approaches in personalized search ranking adopt the sparse search behaviors to learn representation with bottleneck, which do not sufficiently exploit the crucial long-term interest. In fact, there is no doubt that user long-term interest is various but noisy for instant search, and how to exploit it well still remains an open problem.
To tackle this problem, in this work, we propose a novel model named Query-dominant user Interest Network (QIN), including two cascade units to filter the raw user behaviors and reweigh the behavior subsequences. Specifically, we propose a relevance search unit (RSU), which aims to search a subsequence relevant to the query first and then search the sub-subsequences relevant to the target item. These items are then fed into an attention unit called Fused Attention Unit (FAU). It should be able to calculate attention scores from the ID field and attribute field separately, and then adaptively fuse the item embedding and content embedding based on the user engagement of past period. Extensive experiments and ablation studies on real-world datasets demonstrate the superiority of our model over state-of-the-art methods. The QIN now has been successfully deployed on Kuaishou search, an online video search platform, and obtained 7.6% improvement on CTR.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Toward Intelligent Emergency Control for Large-scale Power Systems: Convergence of Learning, Physics, Computing and Control
Authors:
Qiuhua Huang,
Renke Huang,
Tianzhixi Yin,
Sohom Datta,
Xueqing Sun,
Jason Hou,
Jie Tan,
Wenhao Yu,
Yuan Liu,
Xinya Li,
Bruce Palmer,
Ang Li,
Xinda Ke,
Marianna Vaiman,
Song Wang,
Yousu Chen
Abstract:
This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, t…
▽ More
This paper has delved into the pressing need for intelligent emergency control in large-scale power systems, which are experiencing significant transformations and are operating closer to their limits with more uncertainties. Learning-based control methods are promising and have shown effectiveness for intelligent power system control. However, when they are applied to large-scale power systems, there are multifaceted challenges such as scalability, adaptiveness, and security posed by the complex power system landscape, which demand comprehensive solutions. The paper first proposes and instantiates a convergence framework for integrating power systems physics, machine learning, advanced computing, and grid control to realize intelligent grid control at a large scale. Our developed methods and platform based on the convergence framework have been applied to a large (more than 3000 buses) Texas power system, and tested with 56000 scenarios. Our work achieved a 26% reduction in load shedding on average and outperformed existing rule-based control in 99.7% of the test scenarios. The results demonstrated the potential of the proposed convergence framework and DRL-based intelligent control for the future grid.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
HNS: An Efficient Hermite Neural Solver for Solving Time-Fractional Partial Differential Equations
Authors:
Jie Hou,
Zhiying Ma,
Shihui Ying,
Ying Li
Abstract:
Neural network solvers represent an innovative and promising approach for tackling time-fractional partial differential equations by utilizing deep learning techniques. L1 interpolation approximation serves as the standard method for addressing time-fractional derivatives within neural network solvers. However, we have discovered that neural network solvers based on L1 interpolation approximation…
▽ More
Neural network solvers represent an innovative and promising approach for tackling time-fractional partial differential equations by utilizing deep learning techniques. L1 interpolation approximation serves as the standard method for addressing time-fractional derivatives within neural network solvers. However, we have discovered that neural network solvers based on L1 interpolation approximation are unable to fully exploit the benefits of neural networks, and the accuracy of these models is constrained to interpolation errors. In this paper, we present the high-precision Hermite Neural Solver (HNS) for solving time-fractional partial differential equations. Specifically, we first construct a high-order explicit approximation scheme for fractional derivatives using Hermite interpolation techniques, and rigorously analyze its approximation accuracy. Afterward, taking into account the infinitely differentiable properties of deep neural networks, we integrate the high-order Hermite interpolation explicit approximation scheme with deep neural networks to propose the HNS. The experimental results show that HNS achieves higher accuracy than methods based on the L1 scheme for both forward and inverse problems, as well as in high-dimensional scenarios. This indicates that HNS has significantly improved accuracy and flexibility compared to existing L1-based methods, and has overcome the limitations of explicit finite difference approximation methods that are often constrained to function value interpolation. As a result, the HNS is not a simple combination of numerical computing methods and neural networks, but rather achieves a complementary and mutually reinforcing advantages of both approaches. The data and code can be found at \url{https://github.com/hsbhc/HNS}.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
PMNN:Physical Model-driven Neural Network for solving time-fractional differential equations
Authors:
Zhiying Ma,
Jie Hou,
Wenhao Zhu,
Yaxin Peng,
Ying Li
Abstract:
In this paper, an innovative Physical Model-driven Neural Network (PMNN) method is proposed to solve time-fractional differential equations. It establishes a temporal iteration scheme based on physical model-driven neural networks which effectively combines deep neural networks (DNNs) with interpolation approximation of fractional derivatives. Specifically, once the fractional differential operato…
▽ More
In this paper, an innovative Physical Model-driven Neural Network (PMNN) method is proposed to solve time-fractional differential equations. It establishes a temporal iteration scheme based on physical model-driven neural networks which effectively combines deep neural networks (DNNs) with interpolation approximation of fractional derivatives. Specifically, once the fractional differential operator is discretized, DNNs are employed as a bridge to integrate interpolation approximation techniques with differential equations. On the basis of this integration, we construct a neural-based iteration scheme. Subsequently, by training DNNs to learn this temporal iteration scheme, approximate solutions to the differential equations can be obtained. The proposed method aims to preserve the intrinsic physical information within the equations as far as possible. It fully utilizes the powerful fitting capability of neural networks while maintaining the efficiency of the difference schemes for fractional differential equations. Moreover, we validate the efficiency and accuracy of PMNN through several numerical experiments.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Multi-year characterisation of the broad-band emission from the intermittent extreme BL Lac 1ES~2344+514
Authors:
H. Abe,
S. Abe,
V. A. Acciari,
I. Agudo,
T. Aniello,
S. Ansoldi,
L. A. Antonelli,
A. Arbet Engels,
C. Arcaro,
M. Artero,
K. Asano,
D. Baack,
A. Babić,
A. Baquero,
U. Barres de Almeida,
I. Batković,
J. Baxter,
J. Becerra González,
E. Bernardini,
J. Bernete,
A. Berti,
J. Besenrieder,
C. Bigongiari,
A. Biland,
O. Blanch
, et al. (210 additional authors not shown)
Abstract:
The BL Lac 1ES 2344+514 is known for temporary extreme properties (e.g., a shift of the synchrotron SED peak energy $ν_{synch,p}$ above 1keV). While those extreme states were so far observed only during high flux levels, additional multi-year observing campaigns are required to achieve a coherent picture. Here, we report the longest investigation of the source from radio to VHE performed so far, f…
▽ More
The BL Lac 1ES 2344+514 is known for temporary extreme properties (e.g., a shift of the synchrotron SED peak energy $ν_{synch,p}$ above 1keV). While those extreme states were so far observed only during high flux levels, additional multi-year observing campaigns are required to achieve a coherent picture. Here, we report the longest investigation of the source from radio to VHE performed so far, focusing on a systematic characterisation of the intermittent extreme states. While our results confirm that 1ES 2344+514 typically exhibits $ν_{synch,p}>$1keV during elevated flux periods, we also find periods where the extreme state coincides with low flux activity. A strong spectral variability thus happens in the quiescent state, and is likely caused by an increase of the electron acceleration efficiency without a change in the electron injection luminosity. We also report a strong X-ray flare (among the brightest for 1ES 2344+514) without a significant shift of $ν_{synch,p}$. During this particular flare, the X-ray spectrum is among the softest of the campaign. It unveils complexity in the spectral evolution, where the common harder-when-brighter trend observed in BL Lacs is violated. During a low and hard X-ray state, we find an excess of the UV flux with respect to an extrapolation of the X-ray spectrum to lower energies. This UV excess implies that at least two regions contribute significantly to the infrared/optical/ultraviolet/X-ray emission. Using the simultaneous MAGIC, XMM-Newton, NuSTAR, and AstroSat observations, we argue that a region possibly associated with the 10 GHz radio core may explain such an excess. Finally, we investigate a VHE flare, showing an absence of simultaneous variability in the 0.3-2keV band. Using a time-dependent leptonic modelling, we show that this behaviour, in contradiction to single-zone scenarios, can instead be explained by a two-component model.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
The Lovász Theta Function for Recovering Planted Clique Covers and Graph Colorings
Authors:
Jiaxin Hou,
Yong Sheng Soh,
Antonios Varvitsiotis
Abstract:
The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function $\vartheta(G)$, which is specified as the solution of a semidefinit…
▽ More
The problems of computing graph colorings and clique covers are central challenges in combinatorial optimization. Both of these are known to be NP-hard, and thus computationally intractable in the worst-case instance. A prominent approach for computing approximate solutions to these problems is the celebrated Lovász theta function $\vartheta(G)$, which is specified as the solution of a semidefinite program (SDP), and hence tractable to compute. In this work, we move beyond the worst-case analysis and set out to understand whether the Lovász theta function recovers clique covers for random instances that have a latent clique cover structure, possibly obscured by noise. We answer this question in the affirmative and show that for graphs generated from the planted clique model we introduce in this work, the SDP formulation of $\vartheta(G)$ has a unique solution that reveals the underlying clique-cover structure with high-probability. The main technical step is an intermediate result where we prove a deterministic condition of recovery based on an appropriate notion of sparsity.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Pressure-induced superconductivity in polycrystalline La3Ni2O7
Authors:
Gang Wang,
Ningning Wang,
Jun Hou,
Liang Ma,
Lifen Shi,
Zhian Ren,
Yadong Gu,
Xiaoling Shen,
Hanming Ma,
Pengtao Yang,
Ziyi Liu,
Haizhong Guo,
Jian** Sun,
Guangming Zhang,
Jiaqiang Yan,
Bosen Wang,
Yoshiya Uwatoko,
**guang Cheng
Abstract:
We synthesized polycrystalline La3Ni2O7 samples by using the sol-gel method without post-annealing under high oxygen pressure, and then measured temperature-dependent resistivity under various hydrostatic pressures up to 14.5 GPa in a cubic anvil cell apparatus. We find that the density-wave-like anomaly in resistivity is progressively suppressed with increasing pressure and the resistivity drop c…
▽ More
We synthesized polycrystalline La3Ni2O7 samples by using the sol-gel method without post-annealing under high oxygen pressure, and then measured temperature-dependent resistivity under various hydrostatic pressures up to 14.5 GPa in a cubic anvil cell apparatus. We find that the density-wave-like anomaly in resistivity is progressively suppressed with increasing pressure and the resistivity drop corresponding to the onset of superconductivity emerges at pressure as low as 7 GPa. Zero resistivity is achieved at 9 GPa below 6.6 K, which increases quickly with pressure to 35.6 K at 14.5 GPa. The observation of zero-resistance state in the polycrystalline La3Ni2O7 samples under high pressures not only corroborates the recent report of superconductivity in the pressurized La3Ni2O7 crystals but also facilitates further studies on this emerging family of nickelate high-Tc superconductors.
△ Less
Submitted 3 October, 2023; v1 submitted 29 September, 2023;
originally announced September 2023.
-
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
Authors:
Xiaoliang Dai,
Ji Hou,
Chih-Yao Ma,
Sam Tsai,
Jialiang Wang,
Rui Wang,
Peizhao Zhang,
Simon Vandenhende,
Xiaofang Wang,
Abhimanyu Dubey,
Matthew Yu,
Abhishek Kadian,
Filip Radenovic,
Dhruv Mahajan,
Kunpeng Li,
Yue Zhao,
Vladan Petrovic,
Mitesh Kumar Singh,
Simran Motwani,
Yi Wen,
Yiwen Song,
Roshan Sumbaly,
Vignesh Ramanathan,
Zijian He,
Peter Vajda
, et al. (1 additional authors not shown)
Abstract:
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusivel…
▽ More
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on $1.1$ billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of $82.9\%$ compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred $68.4\%$ and $71.3\%$ of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Choice-75: A Dataset on Decision Branching in Script Learning
Authors:
Zhaoyi Joey Hou,
Li Zhang,
Chris Callison-Burch
Abstract:
Script learning studies how stereotypical events unfold, enabling machines to reason about narratives with implicit information. Previous works mostly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given des…
▽ More
Script learning studies how stereotypical events unfold, enabling machines to reason about narratives with implicit information. Previous works mostly consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to make decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. We also present preliminary results with current large language models (LLM). Although they demonstrate overall decent performance, there is still notable headroom in hard scenarios.
△ Less
Submitted 17 March, 2024; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
Authors:
Shafique Ahmed,
Chia-Wei Chen,
Wenze Ren,
Chin-Jou Li,
Ernie Chu,
Jun-Cheng Chen,
Amir Hussain,
Hsin-Min Wang,
Yu Tsao,
Jen-Cheng Hou
Abstract:
Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a com…
▽ More
Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems. In this paper, we introduce a novel audio-visual SE approach, termed DCUC-Net (deep complex U-Net with conformer network). The proposed DCUC-Net leverages complex domain features and a stack of conformer blocks. The encoder and decoder of DCUC-Net are designed using a complex U-Net-based framework. The audio and visual signals are processed using a complex encoder and a ResNet-18 model, respectively. These processed signals are then fused using the conformer blocks and transformed into enhanced speech waveforms via a complex decoder. The conformer blocks consist of a combination of self-attention mechanisms and convolutional operations, enabling DCUC-Net to effectively capture both global and local audio-visual dependencies. Our experimental results demonstrate the effectiveness of DCUC-Net, as it outperforms the baseline model from the COG-MHEAR AVSE Challenge 2023 by a notable margin of 0.14 in terms of PESQ. Additionally, the proposed DCUC-Net performs comparably to a state-of-the-art model and outperforms all other compared models on the Taiwan Mandarin speech with video (TMSV) dataset.
△ Less
Submitted 8 October, 2023; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Origin of magic angles in twisted bilayer graphene: The magic ring
Authors:
Wei-Chen Wang,
Feng-Wu Chen,
Kuan-Sen Lin,
Justin T. Hou,
Ho-Chun Lin,
Mei-Yin Chou
Abstract:
The unexpected discovery of superconductivity and strong electron correlation in twisted bilayer graphene (TBG), a system containing only sp electrons, is considered as one of the most intriguing developments in two-dimensional materials in recent years. The key feature is the emergent flat energy bands near the Fermi level, a favorable condition for novel many-body phases, at the so-called "magic…
▽ More
The unexpected discovery of superconductivity and strong electron correlation in twisted bilayer graphene (TBG), a system containing only sp electrons, is considered as one of the most intriguing developments in two-dimensional materials in recent years. The key feature is the emergent flat energy bands near the Fermi level, a favorable condition for novel many-body phases, at the so-called "magic angles". The physical origin of these interesting flat bands has been elusive to date, hindering the construction of an effective theory for the unconventional electron correlation. In this work, we have identified the importance of charge accumulation in the AA region of the moire supercell and the most critical role of the Fermi ring in AA-stacked bilayer graphene. We show that the magic angles can be predicted by the moire periodicity determined by the size of this Fermi ring. The resonant criterion in momentum space makes it possible to coherently combine states on the Fermi ring through scattering by the moire potential, leading to flat bands near the Fermi level. We thus establish the physical origin of the magic angles in TBG and identify the characteristics of one-particle states associated with the flat bands for further many-body investigations.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Excellent HER and OER Catalyzing Performance of Se-vacancies in Defects-engineering PtSe2: From Simulation to Experiment
Authors:
Yuan Chang,
Panlong Zhai,
Jungang Hou,
Jijun Zhao,
Junfeng Gao
Abstract:
Facing with grave climate change and enormous energy demand, catalyzer gets more and more important due to its significant effect on reducing fossil fuels consumption. Hydrogen evolution reaction (HER) and oxygen evolution reaction (OER) by water splitting are feasible ways to produce clean sustainable energy. Here we systematically explored atomic structures and related STM images of Se defects i…
▽ More
Facing with grave climate change and enormous energy demand, catalyzer gets more and more important due to its significant effect on reducing fossil fuels consumption. Hydrogen evolution reaction (HER) and oxygen evolution reaction (OER) by water splitting are feasible ways to produce clean sustainable energy. Here we systematically explored atomic structures and related STM images of Se defects in PtSe2. The equilibrium fractions of vacancies under variable conditions were detailly predicted. Besides, we found the vacancies are highly kinetic stable, without recovering or aggregation. The Se vacancies in PtSe2 can dramatically enhance the HER performance, comparing with, even better than Pt(111). Beyond, we firstly revealed that PtSe2 monolayer with Se vacancies is also a good OER catalyst. The excellent bipolar catalysis of Se vacancies were further confirmed by experimental measurements. We produced defective PtSe2 by direct selenization of Pt foil at 773 K using a CVD process. Then we observed the HER and OER performance of defective PtSe2 is much highly efficient than Pt foils by a series of measurements. Our work with compelling theoretical and experimental studies indicates PtSe2 with Se defects is an ideal bipolar candidate for HER and OER.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Diffusion-based 3D Object Detection with Random Boxes
Authors:
Xin Zhou,
**ghua Hou,
Tingting Yao,
Dingkang Liang,
Zhe Liu,
Zhikang Zou,
Xiaoqing Ye,
Jianwei Cheng,
Xiang Bai
Abstract:
3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Dif…
▽ More
3D object detection is an essential task for achieving autonomous driving. Existing anchor-based detection methods rely on empirical heuristics setting of anchors, which makes the algorithms lack elegance. In recent years, we have witnessed the rise of several generative models, among which diffusion models show great potential for learning the transformation of two distributions. Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets. During training, the object boxes diffuse from the ground truth boxes to the Gaussian distribution, and the decoder learns to reverse this noise process. In the inference stage, the model progressively refines a set of random boxes to the prediction results. We provide detailed experiments on the KITTI benchmark and achieve promising performance compared to classical anchor-based 3D detection methods.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Gender Inequalities: Women Researchers Require More Knowledge in Specific and Experimental Topics
Authors:
Shiqi Tang,
Dongyi Wang,
Jianhua Hou
Abstract:
Gender inequalities in science have long been observed globally. Studies have demonstrated it through survey data or published literature, focusing on the interests of subjects or authors; few, however, examined the manifestation of gender inequalities on researchers' knowledge status. This study analyzes the relationship between regional and gender identities, topics, and knowledge status while r…
▽ More
Gender inequalities in science have long been observed globally. Studies have demonstrated it through survey data or published literature, focusing on the interests of subjects or authors; few, however, examined the manifestation of gender inequalities on researchers' knowledge status. This study analyzes the relationship between regional and gender identities, topics, and knowledge status while revealing the female labor division in science and scientific research using online Q&A from researchers. We find that gender inequalities are merged with both regional-specific characteristics and global common patterns. Women's field and topic distribution within fields are influenced by regions, yet the prevalent topics are consistent in all regions. Women are more involved in specific topics, particularly topics about experiments with weaker levels of knowledge and they are of less assistance. To promote inequality in science, the scientific community should pay more attention to reducing the knowledge gap and encourage women to work on unexplored topics and areas.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Edge-Assisted Lightweight Region-of-Interest Extraction and Transmission for Vehicle Perception
Authors:
Yan Cheng,
Peng Yang,
Ning Zhang,
Jiawei Hou
Abstract:
To enhance on-road environmental perception for autonomous driving, accurate and real-time analytics on high-resolution video frames generated from on-board cameras be-comes crucial. In this paper, we design a lightweight object location method based on class activation map** (CAM) to rapidly capture the region of interest (RoI) boxes that contain driving safety related objects from on-board cam…
▽ More
To enhance on-road environmental perception for autonomous driving, accurate and real-time analytics on high-resolution video frames generated from on-board cameras be-comes crucial. In this paper, we design a lightweight object location method based on class activation map** (CAM) to rapidly capture the region of interest (RoI) boxes that contain driving safety related objects from on-board cameras, which can not only improve the inference accuracy of vision tasks, but also reduce the amount of transmitted data. Considering the limited on-board computation resources, the RoI boxes extracted from the raw image are offloaded to the edge for further processing. Considering both the dynamics of vehicle-to-edge communications and the limited edge resources, we propose an adaptive RoI box offloading algorithm to ensure prompt and accurate inference by adjusting the down-sampling rate of each box. Extensive experimental results on four high-resolution video streams demonstrate that our approach can effectively improve the overall accuracy by up to 16% and reduce the transmission demand by up to 49%, compared with other benchmarks.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Inferences on Mixing Probabilities and Ranking in Mixed-Membership Models
Authors:
Sohom Bhattacharya,
Jianqing Fan,
Jikai Hou
Abstract:
Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node $i$, there exists a membership vector…
▽ More
Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node $i$, there exists a membership vector $\boldsymbolπ_ i = (\boldsymbolπ_i(1), \boldsymbolπ_i(2),\ldots, \boldsymbolπ_i(K))$, where $\boldsymbolπ_i(k)$ denotes the weight that node $i$ puts in community $k$. We derive novel finite-sample expansion for the $\boldsymbolπ_i(k)$s which allows us to obtain asymptotic distributions and confidence interval of the membership mixing probabilities and other related population quantities. This fills an important gap on uncertainty quantification on the membership profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual member's profile with respect to a given community. The validity of our theoretical results is further demonstrated by via numerical experiments in both real and synthetic data examples.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Pressure Driven Fractionalization of Ionic Spins Results in Cupratelike High-$T_c$ Superconductivity in La$_3$Ni$_2$O$_7$
Authors:
Ruoshi Jiang,
**ning Hou,
Zhiyu Fan,
Zi-Jian Lang,
Wei Ku
Abstract:
Beyond 14GPa of pressure, bi-layered La$_3$Ni$_2$O$_7$ was recently found to develop strong superconductivity above the liquid nitrogen boiling temperature. An immediate essential question is the pressure-induced qualitative change of electronic structure that enables the exciting high-temperature superconductivity. We investigate this timely question via a numerical multi-scale derivation of effe…
▽ More
Beyond 14GPa of pressure, bi-layered La$_3$Ni$_2$O$_7$ was recently found to develop strong superconductivity above the liquid nitrogen boiling temperature. An immediate essential question is the pressure-induced qualitative change of electronic structure that enables the exciting high-temperature superconductivity. We investigate this timely question via a numerical multi-scale derivation of effective many-body physics. At the atomic scale, we first clarify that the system has a strong charge transfer nature with itinerant carriers residing mainly in the in-plane oxygen between spin-1 Ni$^{2+}$ ions. We then elucidate in eV- and sub-eV-scale the key physical effect of the applied pressure: It induces a cupratelike electronic structure through partially screening the Ni spin from 1 to 1/2. This suggests a high-temperature superconductivity in La$_3$Ni$_2$O$_7$ with microscopic mechanism and ($d$-wave) symmetry similar to that in the cuprates.
△ Less
Submitted 20 March, 2024; v1 submitted 22 August, 2023;
originally announced August 2023.
-
MusicJam: Visualizing Music Insights via Generated Narrative Illustrations
Authors:
Chuer Chen,
Nan Cao,
Jiani Hou,
Yi Guo,
Yulei Zhang,
Yang Shi
Abstract:
Visualizing the insights of the invisible music is able to bring listeners an enjoyable and immersive listening experience, and therefore has attracted much attention in the field of information visualization. Over the past decades, various music visualization techniques have been introduced. However, most of them are manually designed by following the visual encoding rules, thus shown in form of…
▽ More
Visualizing the insights of the invisible music is able to bring listeners an enjoyable and immersive listening experience, and therefore has attracted much attention in the field of information visualization. Over the past decades, various music visualization techniques have been introduced. However, most of them are manually designed by following the visual encoding rules, thus shown in form of a graphical visual representation whose visual encoding schema is usually taking effort to understand. Recently, some researchers use figures or illustrations to represent music moods, lyrics, and musical features, which are more intuitive and attractive. However, in these techniques, the figures are usually pre-selected or statically generated, so they cannot precisely convey insights of different pieces of music. To address this issue, in this paper, we introduce MusicJam, a music visualization system that is able to generate narrative illustrations to represent the insight of the input music. The system leverages a novel generation model designed based on GPT-2 to generate meaningful lyrics given the input music and then employs the stable diffusion model to transform the lyrics into coherent illustrations. Finally, the generated results are synchronized and rendered as an MP4 video accompanied by the input music. We evaluated the proposed lyric generation model by comparing it to the baseline models and conducted a user study to estimate the quality of the generated illustrations and the final music videos. The results showed the power of our technique.
△ Less
Submitted 26 August, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Hypergraphs with irrational Turán density and many extremal configurations
Authors:
Jianfeng Hou,
Heng Li,
Guanghui Wang,
Yixiao Zhang
Abstract:
Unlike graphs, determining Turán densities of hypergraphs is known to be notoriously hard in general. The essential reason is that for many classical families of $r$-uniform hypergraphs $\mathcal{F}$, there are perhaps many near-extremal $\mathcal{M}_t$-free configurations with very different structure. Such a phenomenon is called not stable, and Liu and Mubayi gave a first not stable example. Ano…
▽ More
Unlike graphs, determining Turán densities of hypergraphs is known to be notoriously hard in general. The essential reason is that for many classical families of $r$-uniform hypergraphs $\mathcal{F}$, there are perhaps many near-extremal $\mathcal{M}_t$-free configurations with very different structure. Such a phenomenon is called not stable, and Liu and Mubayi gave a first not stable example. Another perhaps reason is that little is known about the set consisting of all possible Turán densities which has cardinality of the continuum. Let $t\ge 2$ be an integer. In this paper, we construct a finite family $\mathcal{M}_t$ of 3-uniform hypergraphs such that the Turán density of $\mathcal{M}_t$ is irrational, and there are $t$ near-extremal $\mathcal{M}_t$-free configurations that are far from each other in edit-distance. This is the first not stable example that has an irrational Turán density. It also provides a new phenomenon about feasible region functions.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Advancing continual lifelong learning in neural information retrieval: definition, dataset, framework, and empirical evaluation
Authors:
**grui Hou,
Georgina Cosma,
Axel Finke
Abstract:
Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for information retrieval tasks, a well-defined task formulation is still lacking, and it is unclear how typical learning strategies perform in this cont…
▽ More
Continual learning refers to the capability of a machine learning model to learn and adapt to new information, without compromising its performance on previously learned tasks. Although several studies have investigated continual learning methods for information retrieval tasks, a well-defined task formulation is still lacking, and it is unclear how typical learning strategies perform in this context. To address this challenge, a systematic task formulation of continual neural information retrieval is presented, along with a multiple-topic dataset that simulates continuous information retrieval. A comprehensive continual neural information retrieval framework consisting of typical retrieval models and continual learning strategies is then proposed. Empirical evaluations illustrate that the proposed framework can successfully prevent catastrophic forgetting in neural information retrieval and enhance performance on previously learned tasks. The results indicate that embedding-based retrieval models experience a decline in their continual learning performance as the topic shift distance and dataset volume of new tasks increase. In contrast, pretraining-based models do not show any such correlation. Adopting suitable learning strategies can mitigate the effects of topic shift and data augmentation.
△ Less
Submitted 19 June, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network
Authors:
Xianqiang Lyu,
Junhui Hou
Abstract:
This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enh…
▽ More
This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. Additionally, DCUNet includes a content-associated deep compensation module at each optimization stage to suppress noise and illumination map estimation errors. To properly mine and leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU.
△ Less
Submitted 26 June, 2024; v1 submitted 10 August, 2023;
originally announced August 2023.
-
TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment
Authors:
Chaofeng Chen,
Jiadi Mo,
**gwen Hou,
Haoning Wu,
Liang Liao,
Wenxiu Sun,
Qiong Yan,
Weisi Lin
Abstract:
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. Inspired by the characteristics of the human visual system, existing methods typically use a combination of global and local representations (\ie, multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-sc…
▽ More
Image Quality Assessment (IQA) is a fundamental task in computer vision that has witnessed remarkable progress with deep neural networks. Inspired by the characteristics of the human visual system, existing methods typically use a combination of global and local representations (\ie, multi-scale features) to achieve superior performance. However, most of them adopt simple linear fusion of multi-scale features, and neglect their possibly complex relationship and interaction. In contrast, humans typically first form a global impression to locate important regions and then focus on local details in those regions. We therefore propose a top-down approach that uses high-level semantics to guide the IQA network to focus on semantically important local distortion regions, named as \emph{TOPIQ}. Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner. A key component of our approach is the proposed cross-scale attention mechanism, which calculates attention maps for lower level features guided by higher level features. This mechanism emphasizes active semantic regions for low-level distortions, thereby improving performance. CFANet can be used for both Full-Reference (FR) and No-Reference (NR) IQA. We use ResNet50 as its backbone and demonstrate that CFANet achieves better or competitive performance on most public FR and NR benchmarks compared with state-of-the-art methods based on vision transformers, while being much more efficient (with only ${\sim}13\%$ FLOPS of the current best FR method). Codes are released at \url{https://github.com/chaofengc/IQA-PyTorch}.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Authors:
Lingdong Kong,
Yaru Niu,
Shaoyuan Xie,
Hanjiang Hu,
Lai Xing Ng,
Benoit R. Cottereau,
Ding Zhao,
Liangjun Zhang,
Hesheng Wang,
Wei Tsang Ooi,
Ruijie Zhu,
Ziyang Song,
Li Liu,
Tianzhu Zhang,
Jun Yu,
Mohan **g,
Pengwei Li,
Xiaohua Qi,
Cheng **,
Yingfeng Chen,
Jie Hou,
Jie Zhang,
Zhen Kan,
Qiang Ling,
Liang Peng
, et al. (18 additional authors not shown)
Abstract:
Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summari…
▽ More
Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
ArcGPT: A Large Language Model Tailored for Real-world Archival Applications
Authors:
Shitou Zhang,
**grui Hou,
Siyuan Peng,
Zuchao Li,
Qibiao Hu,
** Wang
Abstract:
Archives play a crucial role in preserving information and knowledge, and the exponential growth of such data necessitates efficient and automated tools for managing and utilizing archive information resources. Archival applications involve managing massive data that are challenging to process and analyze. Although LLMs have made remarkable progress in diverse domains, there are no publicly availa…
▽ More
Archives play a crucial role in preserving information and knowledge, and the exponential growth of such data necessitates efficient and automated tools for managing and utilizing archive information resources. Archival applications involve managing massive data that are challenging to process and analyze. Although LLMs have made remarkable progress in diverse domains, there are no publicly available archives tailored LLM. Addressing this gap, we introduce ArcGPT, to our knowledge, the first general-purpose LLM tailored to the archival field. To enhance model performance on real-world archival tasks, ArcGPT has been pre-trained on massive and extensive archival domain data. Alongside ArcGPT, we release AMBLE, a benchmark comprising four real-world archival tasks. Evaluation on AMBLE shows that ArcGPT outperforms existing state-of-the-art models, marking a substantial step forward in effective archival data management. Ultimately, ArcGPT aims to better serve the archival community, aiding archivists in their crucial role of preserving and harnessing our collective information and knowledge.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
Authors:
Chenfeng Xu,
Bichen Wu,
Ji Hou,
Sam Tsai,
Ruilong Li,
Jialiang Wang,
Wei Zhan,
Zijian He,
Peter Vajda,
Kurt Keutzer,
Masayoshi Tomizuka
Abstract:
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Unlike existing indoor 3D detection methods that struggle to model scene geometry, our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimiz…
▽ More
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input. Unlike existing indoor 3D detection methods that struggle to model scene geometry, our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. Furthermore, we subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. Our method outperforms state-of-the-arts by 3.9 mAP and 3.1 mAP on the ScanNet and ARKITScenes benchmarks, respectively. We provide extensive analysis to shed light on how NeRF-Det works. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without requiring per-scene optimization. Code is available at \url{https://github.com/facebookresearch/NeRF-Det}.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Downstream-agnostic Adversarial Examples
Authors:
Ziqi Zhou,
Shengshan Hu,
Ruizhi Zhao,
Qian Wang,
Leo Yu Zhang,
Junhui Hou,
Hai **
Abstract:
Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-tr…
▽ More
Self-supervised learning usually uses a large amount of unlabeled data to pre-train an encoder which can be used as a general-purpose feature extractor, such that downstream users only need to perform fine-tuning operations to enjoy the benefit of "large model". Despite this promising prospect, the security of pre-trained encoder has not been thoroughly investigated yet, especially when the pre-trained encoder is publicly available for commercial use.
In this paper, we propose AdvEncoder, the first framework for generating downstream-agnostic universal adversarial examples based on the pre-trained encoder. AdvEncoder aims to construct a universal adversarial perturbation or patch for a set of natural images that can fool all the downstream tasks inheriting the victim pre-trained encoder. Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels. Therefore, we first exploit the high frequency component information of the image to guide the generation of adversarial examples. Then we design a generative attack framework to construct adversarial perturbations/patches by learning the distribution of the attack surrogate dataset to improve their attack success rates and transferability. Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset. We also tailor four defenses for pre-trained encoders, the results of which further prove the attack ability of AdvEncoder.
△ Less
Submitted 14 August, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Improvements on "Multi-Party Quantum Summation without a Third Party based on $d$-Dimensional Bell States"
Authors:
Xiaobing Li,
Jiale Hou,
Haozhen Situ,
Cai Zhang
Abstract:
In 2021, Wu et al. presented a multi-party quantum summation scheme exploiting the entanglement properties of d-dimensional Bell states (Wu et al. in Quantum Inf Process 20:200, 2021). In particular, the authors proposed a three-party quantum summation protocol and then extended their work to a multi-party case. It is claimed that their protocol is secure against outside and participants' attacks.…
▽ More
In 2021, Wu et al. presented a multi-party quantum summation scheme exploiting the entanglement properties of d-dimensional Bell states (Wu et al. in Quantum Inf Process 20:200, 2021). In particular, the authors proposed a three-party quantum summation protocol and then extended their work to a multi-party case. It is claimed that their protocol is secure against outside and participants' attacks. However, this work points out that Wu's protocol has a loophole, i.e., two or more dishonest participants who meet a specific location relationship can conspire to obtain the private inputs of some honest participants without being detected. Accordingly, improvements are proposed to address these issues.
△ Less
Submitted 30 August, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Emergence of high-temperature superconducting phase in the pressurized La3Ni2O7 crystals
Authors:
J. Hou,
P. T. Yang,
Z. Y. Liu,
J. Y. Li,
P. F. Shan,
L. Ma,
G. Wang,
N. N. Wang,
H. Z. Guo,
J. P. Sun,
Y. Uwatoko,
M. Wang,
G. -M. Zhang,
B. S. Wang,
J. -G. Cheng
Abstract:
The recent report of pressure-induced structure transition and signature of superconductivity with Tc = 80 K above 14 GPa in the La3Ni2O7 crystals has garnered considerable attention. To further elaborate this discovery, we carried out comprehensive resistance measurements on the La3Ni2O7 crystals grown with the optical-image floating zone furnace under oxygen pressure (15 bar) by using the diamon…
▽ More
The recent report of pressure-induced structure transition and signature of superconductivity with Tc = 80 K above 14 GPa in the La3Ni2O7 crystals has garnered considerable attention. To further elaborate this discovery, we carried out comprehensive resistance measurements on the La3Ni2O7 crystals grown with the optical-image floating zone furnace under oxygen pressure (15 bar) by using the diamond anvil cell (DAC) and cubic anvil cell (CAC), which employs the solid and liquid pressure transmitting medium, respectively. For the sample #1 measured in DAC, it exhibits a semiconducting-like behavior with large resistance at low pressures and becomes metallic gradually upon compression. At the pressures P >= 13.7 GPa, we observed the appearance of resistance drop as large as ~50% around 70 K, which evolves into a kink-like anomaly at pressures above 40 GPa and shifts to lower temperatures gradually with increasing magnetic field. These observations are consistent with the recent report mentioned above. On the other hand, the sample #2 measured in CAC retains the metallic behavior in the investigated pressure range up to 15 GPa. The hump-like anomaly in resistance around ~130 K at ambient pressure disappears at P >= 2 GPa. In the pressure range from 11 to 15 GPa, we observed the gradual development of a shoulder-like anomaly in resistance at low temperatures, which evolves into a pronounced drop of resistance by 98% below 62 K at 15 GPa, reaching a temperature-independent resistance of 20 uOhm below 20 K. Similarly, this resistance anomaly can be shifted to lower temperatures progressively by applying external magnetic fields, resembling a typical superconducting transition.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
MaxCut in graphs with sparse neighborhoods
Authors:
**ghua Deng,
Jianfeng Hou,
Siwei Lin,
Qinghou Zeng
Abstract:
Let $G$ be a graph with $m$ edges and let $\mathrm{mc}(G)$ denote the size of a largest cut of $G$. The difference $\mathrm{mc}(G)-m/2$ is called the surplus $\mathrm{sp}(G)$ of $G$. A fundamental problem in MaxCut is to determine $\mathrm{sp}(G)$ for $G$ without specific structure, and the degree sequence $d_1,\ldots,d_n$ of $G$ plays a key role in getting lower bounds of $\mathrm{sp}(G)$. A clas…
▽ More
Let $G$ be a graph with $m$ edges and let $\mathrm{mc}(G)$ denote the size of a largest cut of $G$. The difference $\mathrm{mc}(G)-m/2$ is called the surplus $\mathrm{sp}(G)$ of $G$. A fundamental problem in MaxCut is to determine $\mathrm{sp}(G)$ for $G$ without specific structure, and the degree sequence $d_1,\ldots,d_n$ of $G$ plays a key role in getting lower bounds of $\mathrm{sp}(G)$. A classical example, given by Shearer, is that $\mathrm{sp}(G)=Ω(\sum_{i=1}^n\sqrt d_i)$ for triangle-free graphs $G$, implying that $\mathrm{sp}(G)=Ω(m^{3/4})$. It was extended to graphs with sparse neighborhoods by Alon, Krivelevich and Sudakov. In this paper, we establish a novel and stronger result for a more general family of graphs with sparse neighborhoods.
Our result can derive many well-known bounds on surplus of $H$-free graphs for different $H$, such as triangles, even cycles, graphs having a vertex whose removal makes them acyclic, or complete bipartite graphs $K_{s,t}$ with $s\in \{2,3\}$. It can also deduce many new (tight) bounds on $\mathrm{sp}(G)$ in $H$-free graphs $G$ when $H$ is any graph having a vertex whose removal results in a bipartite graph with relatively small Turán number, especially the even wheel. This contributes to a conjecture raised by Alon, Krivelevich and Sudakov. Moreover, we obtain new families of graphs $H$ such that $\mathrm{sp}(G)=Ω(m^{3/4+ε(H)})$ for some constant $ε(H)>0$ in $H$-free graphs $G$, giving evidences to a conjecture suggested by Alon, Bollobás, Krivelevich and Sudakov.
△ Less
Submitted 21 August, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Audio-Visual Speech Enhancement Using Self-supervised Learning to Improve Speech Intelligibility in Cochlear Implant Simulations
Authors:
Richard Lee Lai,
Jen-Cheng Hou,
Mandar Gogate,
Kia Dashtipour,
Amir Hussain,
Yu Tsao
Abstract:
Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability…
▽ More
Individuals with hearing impairments face challenges in their ability to comprehend speech, particularly in noisy environments. The aim of this study is to explore the effectiveness of audio-visual speech enhancement (AVSE) in enhancing the intelligibility of vocoded speech in cochlear implant (CI) simulations. Notably, the study focuses on a challenged scenario where there is limited availability of training data for the AVSE task. To address this problem, we propose a novel deep neural network framework termed Self-Supervised Learning-based AVSE (SSL-AVSE). The proposed SSL-AVSE combines visual cues, such as lip and mouth movements, from the target speakers with corresponding audio signals. The contextually combined audio and visual data are then fed into a Transformer-based SSL AV-HuBERT model to extract features, which are further processed using a BLSTM-based SE model. The results demonstrate several key findings. Firstly, SSL-AVSE successfully overcomes the issue of limited data by leveraging the AV-HuBERT model. Secondly, by fine-tuning the AV-HuBERT model parameters for the target SE task, significant performance improvements are achieved. Specifically, there is a notable enhancement in PESQ (Perceptual Evaluation of Speech Quality) from 1.43 to 1.67 and in STOI (Short-Time Objective Intelligibility) from 0.70 to 0.74. Furthermore, the performance of the SSL-AVSE was evaluated using CI vocoded speech to assess the intelligibility for CI users. Comparative experimental outcomes reveal that in the presence of dynamic noises encountered during human conversations, SSL-AVSE exhibits a substantial improvement. The NCM (Normal Correlation Matrix) values indicate an increase of 26.5% to 87.2% compared to the noisy baseline.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators
Authors:
Sikai Bai,
Shuaicheng Li,
Weiming Zhuang,
Jie Zhang,
Song Guo,
Kunlin Yang,
Jun Hou,
Shuai Zhang,
Junyu Gao,
Shuai Yi
Abstract:
Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled…
▽ More
Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure. FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11 on CIFAR-10 and CINIC-10 datasets.
△ Less
Submitted 11 March, 2024; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers
Authors:
Zhiyu Zhu,
Junhui Hou,
Dapeng Oliver Wu
Abstract:
This paper addresses the problem of cross-modal object tracking from RGB videos and event data. Rather than constructing a complex cross-modal fusion network, we explore the great potential of a pre-trained vision Transformer (ViT). Particularly, we delicately investigate plug-and-play training augmentations that encourage the ViT to bridge the vast distribution gap between the two modalities, ena…
▽ More
This paper addresses the problem of cross-modal object tracking from RGB videos and event data. Rather than constructing a complex cross-modal fusion network, we explore the great potential of a pre-trained vision Transformer (ViT). Particularly, we delicately investigate plug-and-play training augmentations that encourage the ViT to bridge the vast distribution gap between the two modalities, enabling comprehensive cross-modal information interaction and thus enhancing its ability. Specifically, we propose a mask modeling strategy that randomly masks a specific modality of some tokens to enforce the interaction between tokens from different modalities interacting proactively. To mitigate network oscillations resulting from the masking strategy and further amplify its positive effect, we then theoretically propose an orthogonal high-rank loss to regularize the attention matrix. Extensive experiments demonstrate that our plug-and-play training augmentation techniques can significantly boost state-of-the-art one-stream and twostream trackers to a large extent in terms of both tracking precision and success rate. Our new perspective and findings will potentially bring insights to the field of leveraging powerful pre-trained ViTs to model cross-modal data. The code will be publicly available.
△ Less
Submitted 4 September, 2023; v1 submitted 9 July, 2023;
originally announced July 2023.
-
VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks
Authors:
Zhaomin Wu,
Junyi Hou,
Bingsheng He
Abstract:
Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from…
▽ More
Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.
△ Less
Submitted 13 March, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection
Authors:
Yifan Zhang,
Zhiyu Zhu,
Junhui Hou,
Dapeng Wu
Abstract:
The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework for multi-frame 3D object detection based on the DETR-like paradigm. STEMD treats multi-frame 3D obje…
▽ More
The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework for multi-frame 3D object detection based on the DETR-like paradigm. STEMD treats multi-frame 3D object detection as a sequence-to-sequence task and effectively captures spatial-temporal dependencies at both the feature and query levels. Specifically, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network, which represents queries as nodes in a graph and enables effective modeling of object interactions within a social context. To solve the problem of missing hard cases in the proposed output of the encoder in the current frame, we incorporate the output of the previous frame to initialize the query input of the decoder. Moreover, to mitigate the issue of redundant detection results, where the model generates numerous overlap** boxes from similar queries, we consider an IoU regularization term in the loss function, which can distinguish between queries matched with the ground-truth box and queries that are similar but unmatched during the refinement process, leading to reduced redundancy and more accurate detections. Through extensive experiments, we demonstrate the effectiveness of our approach in handling challenging scenarios, while incurring only a minor additional computational overhead. The code is available at \url{https://github.com/Eaphan/STEMD}.
△ Less
Submitted 4 December, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion
Authors:
Weide Liu,
Xiaoyang Zhong,
**gwen Hou,
Shaohua Li,
Haozhe Huang,
Yuming Fang
Abstract:
Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy pred…
▽ More
Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Cosmological Probes of Structure Growth and Tests of Gravity
Authors:
Jiamin Hou,
Julian Bautista,
Maria Berti,
Carolina Cuesta-Lazaro,
César Hernández-Aguayo,
Tilman Tröster,
**glan Zheng
Abstract:
The current standard cosmological model is constructed within the framework of general relativity with a cosmological constant $Λ$, which is often associated with dark energy, and phenomenologically explains the accelerated cosmic expansion. Understanding the nature of dark energy is one of the most appealing questions in achieving a self-consistent physical model at cosmological scales. Modificat…
▽ More
The current standard cosmological model is constructed within the framework of general relativity with a cosmological constant $Λ$, which is often associated with dark energy, and phenomenologically explains the accelerated cosmic expansion. Understanding the nature of dark energy is one of the most appealing questions in achieving a self-consistent physical model at cosmological scales. Modification of general relativity could potentially provide a more natural and physical solution to the accelerated expansion. The growth of the cosmic structure is sensitive in constraining gravity models. In this paper, we aim to provide a concise introductory review of modified gravity models from an observational point of view. We will discuss various mainstream cosmological observables, and their potential advantages and limitations as probes of gravity models.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising
Authors:
Xianqiang Lyu,
Junhui Hou
Abstract:
The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture…
▽ More
The high-dimensional nature of the 4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding, that severely impacts the performance of downstream tasks. To tackle this crucial issue, in contrast to existing methods with empirically-designed architectures, we propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture by assembling various low-dimensional convolution patterns in a probability space for fully capturing spatial-angular information. Building upon the proposed PFE, we then leverage the intrinsic linear imaging model of the coded aperture camera to construct a cycle-consistent 4-D LF reconstruction network from coded measurements. Moreover, we incorporate PFE into an iterative optimization framework for 4-D LF denoising. Our extensive experiments demonstrate the significant superiority of our methods on both real-world and synthetic 4-D LF images, both quantitatively and qualitatively, when compared with state-of-the-art methods. The source code will be publicly available at https://github.com/lyuxianqiang/LFCA-CR-NET.
△ Less
Submitted 10 January, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
$T\bar{T}$-deformed Entanglement Entropy for Integrable Quantum Field Theory
Authors:
Miao He,
Jue Hou,
Yunfeng Jiang
Abstract:
We calculate the $T\bar{T}$-deformed entanglement entropy for integrable quantum field theories (IQFTs) using the form factor bootstrap approach. We solve the form factor bootstrap axioms for the branch-point twist fields and obtain the deformed form factors. Using these form factors, we compute the deformed von Neuman entropy up to two particle contributions. The solution of the form factor axiom…
▽ More
We calculate the $T\bar{T}$-deformed entanglement entropy for integrable quantum field theories (IQFTs) using the form factor bootstrap approach. We solve the form factor bootstrap axioms for the branch-point twist fields and obtain the deformed form factors. Using these form factors, we compute the deformed von Neuman entropy up to two particle contributions. The solution of the form factor axioms is not unique. We find that for the simplest solution of the bootstrap axioms, the UV limit of the entanglement entropy takes the same form as the undeformed one, but the effective central charge is deformed. For solutions with additional CDD-like factors, we can have different behaviors. The IR corrections, which only depends on the particle spectrum is untouched.
△ Less
Submitted 21 January, 2024; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Bogoliubov Corner Excitations in Conventional $s$-Wave Superfluids
Authors:
Wei Tu,
Ya-Jie Wu,
Ning Li,
Miaodi Guo,
Junpeng Hou
Abstract:
Higher-order topological superconductors and superfluids have triggered a great deal of interest in recent years. While Majorana corner or hinge states have been studied intensively, whether superconductors and superfluids, being topological or trivial, host higher-order topological Bogoliubov excitations remains elusive. In this work, we propose that Bogoliubov corner excitations can be driven fr…
▽ More
Higher-order topological superconductors and superfluids have triggered a great deal of interest in recent years. While Majorana corner or hinge states have been studied intensively, whether superconductors and superfluids, being topological or trivial, host higher-order topological Bogoliubov excitations remains elusive. In this work, we propose that Bogoliubov corner excitations can be driven from a trivial conventional $s$-wave superfluid through mirror-symmetric local potentials. The topological Bogoliubov excited modes originate from the nontrivial Bogoliubov excitation bands. These modes are protected by mirror symmetry and robust against mirror-symmetric perturbations as long as the Bogoliubov energy gap remains open. Our work provides new insight into higher-order topological excitation states in superfluids and superconductors.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
The Early Data Release of the Dark Energy Spectroscopic Instrument
Authors:
DESI Collaboration,
A. G. Adame,
J. Aguilar,
S. Ahlen,
S. Alam,
G. Aldering,
D. M. Alexander,
R. Alfarsy,
C. Allende Prieto,
M. Alvarez,
O. Alves,
A. Anand,
F. Andrade-Oliveira,
E. Armengaud,
J. Asorey,
S. Avila,
A. Aviles,
S. Bailey,
A. Balaguera-Antolínez,
O. Ballester,
C. Baltay,
A. Bault,
J. Bautista,
J. Behera,
S. F. Beltran
, et al. (240 additional authors not shown)
Abstract:
The Dark Energy Spectroscopic Instrument (DESI) completed its five-month Survey Validation in May 2021. Spectra of stellar and extragalactic targets from Survey Validation constitute the first major data sample from the DESI survey. This paper describes the public release of those spectra, the catalogs of derived properties, and the intermediate data products. In total, the public release includes…
▽ More
The Dark Energy Spectroscopic Instrument (DESI) completed its five-month Survey Validation in May 2021. Spectra of stellar and extragalactic targets from Survey Validation constitute the first major data sample from the DESI survey. This paper describes the public release of those spectra, the catalogs of derived properties, and the intermediate data products. In total, the public release includes good-quality spectral information from 466,447 objects targeted as part of the Milky Way Survey, 428,758 as part of the Bright Galaxy Survey, 227,318 as part of the Luminous Red Galaxy sample, 437,664 as part of the Emission Line Galaxy sample, and 76,079 as part of the Quasar sample. In addition, the release includes spectral information from 137,148 objects that expand the scope beyond the primary samples as part of a series of secondary programs. Here, we describe the spectral data, data quality, data products, Large-Scale Structure science catalogs, access to the data, and references that provide relevant background to using these spectra.
△ Less
Submitted 15 June, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Validation of the Scientific Program for the Dark Energy Spectroscopic Instrument
Authors:
DESI Collaboration,
A. G. Adame,
J. Aguilar,
S. Ahlen,
S. Alam,
G. Aldering,
D. M. Alexander,
R. Alfarsy,
C. Allende Prieto,
M. Alvarez,
O. Alves,
A. Anand,
F. Andrade-Oliveira,
E. Armengaud,
J. Asorey,
S. Avila,
A. Aviles,
S. Bailey,
A. Balaguera-Antolínez,
O. Ballester,
C. Baltay,
A. Bault,
J. Bautista,
J. Behera,
S. F. Beltran
, et al. (239 additional authors not shown)
Abstract:
The Dark Energy Spectroscopic Instrument (DESI) was designed to conduct a survey covering 14,000 deg$^2$ over five years to constrain the cosmic expansion history through precise measurements of Baryon Acoustic Oscillations (BAO). The scientific program for DESI was evaluated during a five month Survey Validation (SV) campaign before beginning full operations. This program produced deep spectra of…
▽ More
The Dark Energy Spectroscopic Instrument (DESI) was designed to conduct a survey covering 14,000 deg$^2$ over five years to constrain the cosmic expansion history through precise measurements of Baryon Acoustic Oscillations (BAO). The scientific program for DESI was evaluated during a five month Survey Validation (SV) campaign before beginning full operations. This program produced deep spectra of tens of thousands of objects from each of the stellar (MWS), bright galaxy (BGS), luminous red galaxy (LRG), emission line galaxy (ELG), and quasar target classes. These SV spectra were used to optimize redshift distributions, characterize exposure times, determine calibration procedures, and assess observational overheads for the five-year program. In this paper, we present the final target selection algorithms, redshift distributions, and projected cosmology constraints resulting from those studies. We also present a `One-Percent survey' conducted at the conclusion of Survey Validation covering 140 deg$^2$ using the final target selection algorithms with exposures of a depth typical of the main survey. The Survey Validation indicates that DESI will be able to complete the full 14,000 deg$^2$ program with spectroscopically-confirmed targets from the MWS, BGS, LRG, ELG, and quasar programs with total sample sizes of 7.2, 13.8, 7.46, 15.7, and 2.87 million, respectively. These samples will allow exploration of the Milky Way halo, clustering on all scales, and BAO measurements with a statistical precision of 0.28% over the redshift interval $z<1.1$, 0.39% over the redshift interval $1.1<z<1.9$, and 0.46% over the redshift interval $1.9<z<3.5$.
△ Less
Submitted 12 January, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
NeuroGF: A Neural Representation for Fast Geodesic Distance and Path Queries
Authors:
Qijian Zhang,
Junhui Hou,
Yohanes Yudhi Adikusuma,
Wen** Wang,
Ying He
Abstract:
Geodesics are essential in many geometry processing applications. However, traditional algorithms for computing geodesic distances and paths on 3D mesh models are often inefficient and slow. This makes them impractical for scenarios that require extensive querying of arbitrary point-to-point geodesics. Although neural implicit representations have emerged as a popular way of representing 3D shape…
▽ More
Geodesics are essential in many geometry processing applications. However, traditional algorithms for computing geodesic distances and paths on 3D mesh models are often inefficient and slow. This makes them impractical for scenarios that require extensive querying of arbitrary point-to-point geodesics. Although neural implicit representations have emerged as a popular way of representing 3D shape geometries, there is still no research on representing geodesics with deep implicit functions. To bridge this gap, this paper presents the first attempt to represent geodesics on 3D mesh models using neural implicit functions. Specifically, we introduce neural geodesic fields (NeuroGFs), which are learned to represent the all-pairs geodesics of a given mesh. By using NeuroGFs, we can efficiently and accurately answer queries of arbitrary point-to-point geodesic distances and paths, overcoming the limitations of traditional algorithms. Evaluations on common 3D models show that NeuroGFs exhibit exceptional performance in solving the single-source all-destination (SSAD) and point-to-point geodesics, and achieve high accuracy consistently. Besides, NeuroGFs also offer the unique advantage of encoding both 3D geometry and geodesics in a unified representation. Moreover, we further extend generalizable learning frameworks of NeuroGFs by adding shape feature encoders, which also show satisfactory performances for unseen shapes and categories. Code is made available at https://github.com/keeganhk/NeuroGF/tree/master.
△ Less
Submitted 28 September, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Unleash the Potential of 3D Point Cloud Modeling with A Calibrated Local Geometry-driven Distance Metric
Authors:
Siyu Ren,
Junhui Hou
Abstract:
Quantifying the dissimilarity between two unstructured 3D point clouds is a challenging task, with existing metrics often relying on measuring the distance between corresponding points that can be either inefficient or ineffective. In this paper, we propose a novel distance metric called Calibrated Local Geometry Distance (CLGD), which computes the difference between the underlying 3D surfaces cal…
▽ More
Quantifying the dissimilarity between two unstructured 3D point clouds is a challenging task, with existing metrics often relying on measuring the distance between corresponding points that can be either inefficient or ineffective. In this paper, we propose a novel distance metric called Calibrated Local Geometry Distance (CLGD), which computes the difference between the underlying 3D surfaces calibrated and induced by a set of reference points. By associating each reference point with two given point clouds through computing its directional distances to them, the difference in directional distances of an identical reference point characterizes the geometric difference between a typical local region of the two point clouds. Finally, CLGD is obtained by averaging the directional distance differences of all reference points. We evaluate CLGD on various optimization and unsupervised learning-based tasks, including shape reconstruction, rigid registration, scene flow estimation, and feature representation. Extensive experiments show that CLGD achieves significantly higher accuracy under all tasks in a memory and computationally efficient manner, compared with existing metrics. As a generic metric, CLGD has the potential to advance 3D point cloud modeling. The source code is publicly available at https://github.com/rsy6318/CLGD.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
Authors:
Xingqun Qi,
Chen Liu,
Lincheng Li,
Jie Hou,
Haoran Xin,
Xin Yu
Abstract:
Generating vivid and diverse 3D co-speech gestures is crucial for various applications in animating virtual avatars. While most existing methods can generate gestures from audio directly, they usually overlook that emotion is one of the key factors of authentic co-speech gesture generation. In this work, we propose EmotionGesture, a novel framework for synthesizing vivid and diverse emotional co-s…
▽ More
Generating vivid and diverse 3D co-speech gestures is crucial for various applications in animating virtual avatars. While most existing methods can generate gestures from audio directly, they usually overlook that emotion is one of the key factors of authentic co-speech gesture generation. In this work, we propose EmotionGesture, a novel framework for synthesizing vivid and diverse emotional co-speech 3D gestures from audio. Considering emotion is often entangled with the rhythmic beat in speech audio, we first develop an Emotion-Beat Mining module (EBM) to extract the emotion and audio beat features as well as model their correlation via a transcript-based visual-rhythm alignment. Then, we propose an initial pose based Spatial-Temporal Prompter (STP) to generate future gestures from the given initial poses. STP effectively models the spatial-temporal correlations between the initial poses and the future gestures, thus producing the spatial-temporal coherent pose prompt. Once we obtain pose prompts, emotion, and audio beat features, we will generate 3D co-speech gestures through a transformer architecture. However, considering the poses of existing datasets often contain jittering effects, this would lead to generating unstable gestures. To address this issue, we propose an effective objective function, dubbed Motion-Smooth Loss. Specifically, we model motion offset to compensate for jittering ground-truth by forcing gestures to be smooth. Last, we present an emotion-conditioned VAE to sample emotion features, enabling us to generate diverse emotional results. Extensive experiments demonstrate that our framework outperforms the state-of-the-art, achieving vivid and diverse emotional co-speech 3D gestures. Our code and dataset will be released at the project page: https://xingqunqi-lab.github.io/Emotion-Gesture-Web/
△ Less
Submitted 3 January, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
VDD: Varied Drone Dataset for Semantic Segmentation
Authors:
Wenxiao Cai,
Ke **,
**yan Hou,
Cong Guo,
Letian Wu,
Wankou Yang
Abstract:
Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus…
▽ More
Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD and UAVid, integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It's expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at \href{our website}{https://github.com/RussRobin/VDD}.
△ Less
Submitted 2 July, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach
Authors:
Haoning Wu,
Erli Zhang,
Liang Liao,
Chaofeng Chen,
**gwen Hou,
Annan Wang,
Wenxiu Sun,
Qiong Yan,
Weisi Lin
Abstract:
The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for th…
▽ More
The proliferation of in-the-wild videos has greatly expanded the Video Quality Assessment (VQA) problem. Unlike early definitions that usually focus on limited distortion types, VQA on in-the-wild videos is especially challenging as it could be affected by complicated factors, including various distortions and diverse contents. Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e.g. sharpness of a video). To solve this problem, we collect over two million opinions on 4,543 in-the-wild videos on 13 dimensions of quality-related factors, including in-capture authentic distortions (e.g. motion blur, noise, flicker), errors introduced by compression and transmission, and higher-level experiences on semantic contents and aesthetic issues (e.g. composition, camera trajectory), to establish the multi-dimensional Maxwell database. Specifically, we ask the subjects to label among a positive, a negative, and a neutral choice for each dimension. These explanation-level opinions allow us to measure the relationships between specific quality factors and abstract subjective quality ratings, and to benchmark different categories of VQA algorithms on each dimension, so as to more comprehensively analyze their strengths and weaknesses. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies vision-language foundation model CLIP to better capture important quality issues as observed in our analyses. The MaxVQA can jointly evaluate various specific quality factors and final quality scores with state-of-the-art accuracy on all dimensions, and superb generalization ability on existing datasets. Code and data available at https://github.com/VQAssessment/MaxVQA.
△ Less
Submitted 3 August, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
LATTE: Label-efficient Incident Phenoty** from Longitudinal Electronic Health Records
Authors:
Jun Wen,
Jue Hou,
Clara-Lea Bonzel,
Yihan Zhao,
Victor M. Castro,
Vivian S. Gainer,
Dana Weisenfeld,
Tianrun Cai,
Yuk-Lam Ho,
Vidul A. Panickan,
Lauren Costa,
Chuan Hong,
J. Michael Gaziano,
Katherine P. Liao,
Junwei Lu,
Kelly Cho,
Tianxi Cai
Abstract:
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical eve…
▽ More
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embedding vectors from large-scale EHR data as prior knowledge, LATTE selects predictive EHR features in a concept re-weighting module by mining their relationship to the target event and compresses their information into longitudinal visit embeddings through a visit attention learning network. LATTE employs a recurrent neural network to capture the sequential dependency between the target event and visit embeddings before/after it. To improve label efficiency, LATTE constructs highly informative longitudinal silver-standard labels from large-scale unlabeled patients to perform unsupervised pre-training and semi-supervised joint training. Finally, LATTE enhances cross-site portability via contrastive representation learning. LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis. We use various evaluation metrics present in the literature including the $ABC_{gain}$, the proportion of reduction in the area between the observed event indicator and the predicted cumulative incidences in reference to the prediction per incident prevalence. LATTE consistently achieves substantial improvement over benchmark methods such as SAMGEP and RETAIN in all settings.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.