Search | arXiv e-print repository

Joint Demosaicing and Denoising with Double Deep Image Priors

Authors: Taihui Li, Anish Lahiri, Yutong Dai, Owen Mayer

Abstract: Demosaicing and denoising of RAW images are crucial steps in the processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of th… ▽ More Demosaicing and denoising of RAW images are crucial steps in the processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAW images and accumulate errors from one step to another. Recent deep neural-network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise. In this paper, we propose a novel joint demosaicing and denoising method, dubbed JDD-DoubleDIP, which operates directly on a single RAW image without requiring any training data. We validate the effectiveness of our method on two popular datasets -- Kodak and McMaster -- with various noises and noise intensities. The experimental results show that our method consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.08622 [pdf, other]

Representation Learning in Low-rank Slate-based Recommender Systems

Authors: Yijia Dai, Wen Sun

Abstract: Reinforcement learning (RL) in recommendation systems offers the potential to optimize recommendations for long-term user engagement. However, the environment often involves large state and action spaces, which makes it hard to efficiently learn and explore. In this work, we propose a sample-efficient representation learning algorithm, using the standard slate recommendation setup, to treat this a… ▽ More Reinforcement learning (RL) in recommendation systems offers the potential to optimize recommendations for long-term user engagement. However, the environment often involves large state and action spaces, which makes it hard to efficiently learn and explore. In this work, we propose a sample-efficient representation learning algorithm, using the standard slate recommendation setup, to treat this as an online RL problem with low-rank Markov decision processes (MDPs). We also construct the recommender simulation environment with the proposed setup and sampling method. △ Less

Submitted 18 September, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: in MFPL, ICML 2023

arXiv:2309.08348 [pdf, other]

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Authors: Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, **gdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Abstract: Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023… ▽ More Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023 challenge in ICASSP 2024 Signal Processing Grand Challenges. Unlike existing audio-visual speech enhance-ment challenges primarily focused on simulation data, the MISP 2023 challenge uniquely explores how front-end speech processing, combined with visual clues, impacts back-end tasks in real-world scenarios. This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments. This paper delivers a thorough overview of the task setting, dataset, and baseline system of the MISP 2023 challenge. It also includes an in-depth analysis of the challenges participants may encounter. The experimental results highlight the demanding nature of this task, and we look forward to the innovative solutions participants will bring forward. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

arXiv:2309.04953 [pdf, ps, other]

Extracting the number of type-B Goldstone modes and the dynamical critical exponent for a type of scale-invariant states

Authors: Huan-Qiang Zhou, Yan-Wei Dai, Qian-Qian Shi, Ian P. McCulloch, Murray T. Batchelor

Abstract: A generic scheme is proposed to perform a finite-entanglement scaling analysis for scale-invariant states, which appear to be highly degenerate ground states arising from spontaneous symmetry breaking with type-B Goldstone modes. This allows us to extract the number of type-B Goldstone modes and the dynamical critical exponent, in combination with a finite block-size scaling analysis, from numeric… ▽ More A generic scheme is proposed to perform a finite-entanglement scaling analysis for scale-invariant states, which appear to be highly degenerate ground states arising from spontaneous symmetry breaking with type-B Goldstone modes. This allows us to extract the number of type-B Goldstone modes and the dynamical critical exponent, in combination with a finite block-size scaling analysis, from numerical simulations of quantum many-body systems in the context of tensor network representations. The number of type-B Goldstone modes is identical to the fractal dimension, thus reflecting an abstract fractal underlying the ground state subspace. As illustrative examples, we investigate the spin-$s$ Heisenberg ferromagnetic model, the $\rm{SU}(3)$ ferromagnetic model and the $\rm{SO}(4)$ spin-orbital model. △ Less

Submitted 30 November, 2023; v1 submitted 10 September, 2023; originally announced September 2023.

Comments: 14 pages, 24 figures, 11 tables. Minor changes

arXiv:2309.03559 [pdf, other]

An Anchor Learning Approach for Citation Field Learning

Authors: Zilin Yuan, Borun Chen, Yimeng Dai, Yinghui Li, Hai-Tao Zheng, Rui Zhang

Abstract: Citation field learning is to segment a citation string into fields of interest such as author, title, and venue. Extracting such fields from citations is crucial for citation indexing, researcher profile analysis, etc. User-generated resources like academic homepages and Curriculum Vitae, provide rich citation field information. However, extracting fields from these resources is challenging due t… ▽ More Citation field learning is to segment a citation string into fields of interest such as author, title, and venue. Extracting such fields from citations is crucial for citation indexing, researcher profile analysis, etc. User-generated resources like academic homepages and Curriculum Vitae, provide rich citation field information. However, extracting fields from these resources is challenging due to inconsistent citation styles, incomplete sentence syntax, and insufficient training data. To address these challenges, we propose a novel algorithm, CIFAL (citation field learning by anchor learning), to boost the citation field learning performance. CIFAL leverages the anchor learning, which is model-agnostic for any Pre-trained Language Model, to help capture citation patterns from the data of different citation styles. The experiments demonstrate that CIFAL outperforms state-of-the-art methods in citation field learning, achieving a 2.68% improvement in field-level F1-scores. Extensive analysis of the results further confirms the effectiveness of CIFAL quantitatively and qualitatively. △ Less

Submitted 14 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: accepted by ICASSP2024

arXiv:2309.03490 [pdf, other]

Lipschitz Transport Maps via the Follmer Flow

Authors: Yin Dai, Yuan Gao, Jian Huang, Yuling Jiao, Lican Kang, ** Liu

Abstract: Inspired by the construction of the F{ö}llmer process, we construct a unit-time flow on the Euclidean space, termed the F{ö}llmer flow, whose flow map at time 1 pushes forward a standard Gaussian measure onto a general target measure. We study the well-posedness of the F{ö}llmer flow and establish the Lipschitz property of the flow map at time 1. We apply the Lipschitz map** to several rich clas… ▽ More Inspired by the construction of the F{ö}llmer process, we construct a unit-time flow on the Euclidean space, termed the F{ö}llmer flow, whose flow map at time 1 pushes forward a standard Gaussian measure onto a general target measure. We study the well-posedness of the F{ö}llmer flow and establish the Lipschitz property of the flow map at time 1. We apply the Lipschitz map** to several rich classes of probability measures on deriving dimension-free functional inequalities and concentration inequalities for the empirical measure. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.03126 [pdf, other]

Everyone Deserves A Reward: Learning Customized Human Preferences

Authors: Pengyu Cheng, Jiawen Xie, Ke Bai, Yong Dai, Nan Du

Abstract: Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferenc… ▽ More Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to different religions, politics, cultures, etc. Moreover, each individual can have their unique preferences on various topics. Neglecting the diversity of human preferences, current human feedback aligning methods only consider a general reward model, which is below satisfaction for customized or personalized application scenarios. To explore customized preference learning, we collect a domain-specific preference (DSP) dataset, which includes preferred responses for each given query from four practical domains. Besides, from the perspective of data efficiency, we propose a three-stage customized RM learning scheme, then empirically verify its effectiveness on both general preference datasets and our DSP set. Furthermore, we test multiple training and data strategies on the three learning stages. We find several ways to better preserve the general preferring ability while training the customized RMs, especially general preference enrichment, and customized preference imitation learning. The DSP dataset and code are available at https://github.com/Linear95/DSP. △ Less

Submitted 15 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02043 [pdf, other]

Decomposed Guided Dynamic Filters for Efficient RGB-Guided Depth Completion

Authors: Yufei Wang, Yuxin Mao, Qi Liu, Yuchao Dai

Abstract: RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in thi… ▽ More RGB-guided depth completion aims at predicting dense depth maps from sparse depth measurements and corresponding RGB images, where how to effectively and efficiently exploit the multi-modal information is a key issue. Guided dynamic filters, which generate spatially-variant depth-wise separable convolutional filters from RGB features to guide depth features, have been proven to be effective in this task. However, the dynamically generated filters require massive model parameters, computational costs and memory footprints when the number of feature channels is large. In this paper, we propose to decompose the guided dynamic filters into a spatially-shared component multiplied by content-adaptive adaptors at each spatial location. Based on the proposed idea, we introduce two decomposition schemes A and B, which decompose the filters by splitting the filter structure and using spatial-wise attention, respectively. The decomposed filters not only maintain the favorable properties of guided dynamic filters as being content-dependent and spatially-variant, but also reduce model parameters and hardware costs, as the learned adaptors are decoupled with the number of feature channels. Extensive experimental results demonstrate that the methods using our schemes outperform state-of-the-art methods on the KITTI dataset, and rank 1st and 2nd on the KITTI benchmark at the time of submission. Meanwhile, they also achieve comparable performance on the NYUv2 dataset. In addition, our proposed methods are general and could be employed as plug-and-play feature fusion blocks in other multi-modal fusion tasks such as RGB-D salient object detection. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.14032 [pdf, ps, other]

$ρ$-meson longitudinal leading-twist distribution amplitude revisited and the $D\to ρ$ semileptonic decay

Authors: Tao Zhong, Ya-Hong Dai, Hai-Bing Fu

Abstract: Motivated by our previous work [Phys. Rev. D \textbf{104}, no.1, 016021 (2021)] on pionic leading-twist distribution amplitude (DA), we revisit $ρ$-meson leading-twist longitudinal DA $φ_{2;ρ}^\|(x,μ)$ in this paper. A model proposed by Chang based on the Dyson-Schwinger equations (DSEs) is adopted to describe the behavior of $φ_{2;ρ}^\|(x,μ)$. On the other hand, the $ξ$-moments of… ▽ More Motivated by our previous work [Phys. Rev. D \textbf{104}, no.1, 016021 (2021)] on pionic leading-twist distribution amplitude (DA), we revisit $ρ$-meson leading-twist longitudinal DA $φ_{2;ρ}^\|(x,μ)$ in this paper. A model proposed by Chang based on the Dyson-Schwinger equations (DSEs) is adopted to describe the behavior of $φ_{2;ρ}^\|(x,μ)$. On the other hand, the $ξ$-moments of $φ_{2;ρ}^\|(x,μ)$ are calculated with the QCD sum rules in the framework of the background field theory. The sum rule formula for those moments are improved. More accurate values for the first five nonzero $ξ$-moments at typical scale $μ=1, 1.4, 2, 3~{\rm GeV}$ are given, e.g., at $μ= 1~{\rm GeV}$, \modi{$\langleξ^2\rangle_{2;ρ}^\| = 0.220(6) $, $\langleξ^4\rangle_{2;ρ}^\| = 0.103(4)$, $\langleξ^6\rangle_{2;ρ}^\| = 0.066(5)$, $\langleξ^8\rangle_{2;ρ}^\| = 0.046(4)$ and $\langleξ^{10}\rangle_{2;ρ}^\| = 0.035(3)$}. By fitting those values with the least squares method, the DSE model for $φ_{2;ρ}^\|(x,μ)$ is determined. By taking the left-handed current light-cone sum rule approach, we get the transition form factor at large recoil region, {\it i.e.} $A_1(0) = 0.498^{+0.014}_{-0.012}$, $A_2(0)=0.460^{+0.055}_{-0.047}$, $V(0) = 0.800^{+0.015}_{-0.014}$, and the ratio $r_2 = 0.923^{+0.133}_{-0.119}$, $r_V = 1.607^{+0.071}_{-0.071}$. After making the extrapolation with a rapidly converging series based on $z(t)$-expansion, we present the decay width for the semileptonic decays $D\toρ\ell^+ν_\ell$. Finally, the branching fractions are $\mathcal{B}(D^0\to ρ^- e^+ ν_e) = 1.889^{+0.176}_{-0.170}\pm 0.005$, $\mathcal{B}(D^+ \to ρ^0 e^+ ν_e) = 2.380^{+0.221}_{-0.214}\pm 0.012$, $\mathcal{B}(D^0\to ρ^- μ^+ ν_μ) = 1.881^{+0.174}_{-0.168}\pm 0.005$, $\mathcal{B}(D^+ \to ρ^0 μ^+ ν_μ) =2.369^{+0.219}_{-0.211}\pm 0.011$. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: 9 pages, 3 figures

arXiv:2308.13774 [pdf, other]

Central Similarity Multi-View Hashing for Multimedia Retrieval

Authors: Jian Zhu, Wen Cheng, Yu Cui, Chang Tang, Yuyang Dai, Yong Li, Lingfang Zeng

Abstract: Hash representation learning of multi-view heterogeneous data is the key to improving the accuracy of multimedia retrieval. However, existing methods utilize local similarity and fall short of deeply fusing the multi-view features, resulting in poor retrieval accuracy. Current methods only use local similarity to train their model. These methods ignore global similarity. Furthermore, most recent w… ▽ More Hash representation learning of multi-view heterogeneous data is the key to improving the accuracy of multimedia retrieval. However, existing methods utilize local similarity and fall short of deeply fusing the multi-view features, resulting in poor retrieval accuracy. Current methods only use local similarity to train their model. These methods ignore global similarity. Furthermore, most recent works fuse the multi-view features via a weighted sum or concatenation. We contend that these fusion methods are insufficient for capturing the interaction between various views. We present a novel Central Similarity Multi-View Hashing (CSMVH) method to address the mentioned problems. Central similarity learning is used for solving the local similarity problem, which can utilize the global similarity between the hash center and samples. We present copious empirical data demonstrating the superiority of gate-based fusion over conventional approaches. On the MS COCO and NUS-WIDE, the proposed CSMVH performs better than the state-of-the-art methods by a large margin (up to 11.41% mean Average Precision (mAP) improvement). △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: accepted by the Asia Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (APWeb-WAIM2023)

arXiv:2308.13191 [pdf, other]

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Authors: Jiawen Xie, Pengyu Cheng, Xiao Liang, Yong Dai, Nan Du

Abstract: Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the offthe-shelf pre-trained transformer… ▽ More Although dominant in natural language processing, transformer-based models remain challenged by the task of long-sequence processing, because the computational cost of self-attention operations in transformers swells quadratically with the input sequence length. To alleviate the complexity of long-sequence processing, we propose a simple framework to enable the offthe-shelf pre-trained transformers to process much longer sequences, while the computation and memory costs remain growing linearly with the input sequence lengths. More specifically, our method divides each long-sequence input into a batch of chunks, then aligns the interchunk information during the encoding steps, and finally selects the most representative hidden states from the encoder for the decoding process. To extract inter-chunk semantic information, we align the start and end token embeddings among chunks in each encoding transformer block. To learn an effective hidden selection policy, we design a dual updating scheme inspired by reinforcement learning, which regards the decoders of transformers as environments, and the downstream performance metrics as the rewards to evaluate the hidden selection actions. Our empirical results on real-world long-text summarization and reading comprehension tasks demonstrate effective improvements compared to prior longsequence processing baselines. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.11925 [pdf, other]

Solving Elliptic Optimal Control Problems via Neural Networks and Optimality System

Authors: Yongcheng Dai, Bangti **, Ramesh Sau, Zhi Zhou

Abstract: In this work, we investigate a neural network based solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. It utilizes a coupled system derived from the first-order optimality system of the optimal control problem, and employs deep neural networks to represent the solutions to the reduced system. We present an error analysis of… ▽ More In this work, we investigate a neural network based solver for optimal control problems (without / with box constraint) for linear and semilinear second-order elliptic problems. It utilizes a coupled system derived from the first-order optimality system of the optimal control problem, and employs deep neural networks to represent the solutions to the reduced system. We present an error analysis of the scheme, and provide $L^2(Ω)$ error bounds on the state, control and adjoint in terms of neural network parameters (e.g., depth, width, and parameter bounds) and the numbers of sampling points. The main tools in the analysis include offset Rademacher complexity and boundedness and Lipschitz continuity of neural network functions. We present several numerical examples to illustrate the method and compare it with two existing ones. △ Less

Submitted 8 May, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: 26 pages

arXiv:2308.10705 [pdf, other]

Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling

Authors: Haorui Ji, Hui Deng, Yuchao Dai, Hongdong Li

Abstract: Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of n… ▽ More Most of the previous 3D human pose estimation work relied on the powerful memory capability of the network to obtain suitable 2D-3D map**s from the training data. Few works have studied the modeling of human posture deformation in motion. In this paper, we propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton, and a frame-by-frame skeleton deformation. A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence, and then sum them to obtain the pose of each frame. Subsequently, a loss term based on the diffusion model is used to ensure that the pipeline learns the correct prior motion knowledge. Finally, we have evaluated our proposed method on mainstream datasets and obtained superior results outperforming the state-of-the-art. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.09064 [pdf, other]

The Lyman Continuum Escape Fraction of Star-forming Galaxies at $2.4\lesssim z\lesssim3.7$ from UVCANDELS

Authors: Xin Wang, Harry I. Teplitz, Brent M. Smith, Rogier A. Windhorst, Marc Rafelski, Vihang Mehta, Anahita Alavi, Gabriel Brammer, James Colbert, Norman Grogin, Nimish P. Hathi, Anton M. Koekemoer, Laura Prichard, Claudia Scarlata, Ben Sunnquist, Pablo Arrabal Haro, Christopher Conselice, Eric Gawiser, Yicheng Guo, Matthew Hayes, Rolf A. Jansen, Zhiyuan Ji, Ray A. Lucas, Robert O'Connell, Brant Robertson , et al. (52 additional authors not shown)

Abstract: The UltraViolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) survey is a Hubble Space Telescope (HST) Cycle-26 Treasury Program, allocated in total 164 orbits of primary Wide-Field Camera 3 Ultraviolet and Visible light F275W imaging with coordinated parallel Advanced Camera for Surveys F435W imaging, on four of the five premier extragalactic sur… ▽ More The UltraViolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey Fields (UVCANDELS) survey is a Hubble Space Telescope (HST) Cycle-26 Treasury Program, allocated in total 164 orbits of primary Wide-Field Camera 3 Ultraviolet and Visible light F275W imaging with coordinated parallel Advanced Camera for Surveys F435W imaging, on four of the five premier extragalactic survey fields: GOODS-N, GOODS-S, EGS, and COSMOS. We introduce this survey by presenting a thorough search for galaxies at $z\gtrsim2.4$ that leak significant Lyman continuum (LyC) radiation, as well as a stringent constraint on the LyC escape fraction ($f_{\rm esc}$) from stacking the UV images of a population of star-forming galaxies with secure redshifts. Our extensive search for LyC emission and stacking analysis benefit from the catalogs of high-quality spectroscopic redshifts compiled from archival ground-based data and HST slitless spectroscopy, carefully vetted by dedicated visual inspection efforts. We report a sample of five galaxies as individual LyC leaker candidates, showing $f_{\rm esc}^{\rm rel}\gtrsim60\%$ estimated using detailed Monte Carlo analysis of intergalactic medium attenuation. We develop a robust stacking method to apply to five samples of in total 85 non-detection galaxies in the redshift range of $z\in[2.4,3.7]$. Most stacks give tight 2-$σ$ upper limits below $f_{\rm esc}^{\rm rel}<6\%$. A stack for a subset of 32 emission-line galaxies shows tentative LyC leakage detected at 2.9-$σ$, indicating $f_{\rm esc}^{\rm rel}=5.7\%$ at $z\sim2.65$, supporting the key role of such galaxies in contributing to the cosmic reionization and maintaining the UV ionization background. These new F275W and F435W imaging mosaics from UVCANDELS have been made publicly available on the Barbara A. Mikulski Archive for Space Telescopes. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 33 pages, 21 figures, and 5 tables. Resubmitted after addressing the referee report

arXiv:2308.08488 [pdf, other]

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

Authors: Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee

Abstract: In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve a… ▽ More In recent research, slight performance improvement is observed from automatic speech recognition systems to audio-visual speech recognition systems in the end-to-end framework with low-quality videos. Unmatching convergence rates and specialized input representations between audio and visual modalities are considered to cause the problem. In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework. First, we explore the correlation between lip shapes and syllable-level subword units in Mandarin to establish good frame-level syllable boundaries from lip shapes. This enables accurate alignment of video and audio streams during visual model pre-training and cross-modal fusion. Next, we propose an audio-guided cross-modal fusion encoder (CMFE) neural network to utilize main training parameters for multiple cross-modal attention layers to make full use of modality complementarity. Experiments on the MISP2021-AVSR data set show the effectiveness of the two proposed techniques. Together, using only a relatively small amount of training data, the final system achieves better performances than state-of-the-art systems with more complex front-ends and back-ends. △ Less

Submitted 8 March, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 6 pages, 2 figures, published in ICME2023

arXiv:2308.08288 [pdf, other]

Improving Audio-Visual Segmentation with Bidirectional Generation

Authors: Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

Abstract: The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio… ▽ More The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. This framework establishes robust correlations between an object's visual characteristics and its associated sound, thereby enhancing the performance of AVS. To achieve this, we employ a visual-to-audio projection component that reconstructs audio features from object segmentation masks and minimizes reconstruction errors. Moreover, recognizing that many sounds are linked to object movements, we introduce an implicit volumetric motion estimation module to handle temporal dynamics that may be challenging to capture using conventional optical flow methods. To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark. As a result, we establish a new state-of-the-art performance level in the AVS benchmark, particularly excelling in the challenging MS3 subset which involves segmenting multiple sound sources. To facilitate reproducibility, we plan to release both the source code and the pre-trained model. △ Less

Submitted 19 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: AAAI Camera Ready. Dawei Hao and Yuxin Mao contribute equality to this paper. Yiran Zhong is the corresponding author. The code will be released at https://github.com/OpenNLPLab/AVS-bidirectional

arXiv:2308.04413 [pdf, other]

Digging into Depth Priors for Outdoor Neural Radiance Fields

Authors: Chen Wang, Jiadai Sun, Lina Liu, Chenming Wu, Zhelun Shen, Dayan Wu, Yuchao Dai, Liangjun Zhang

Abstract: Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for… ▽ More Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for selecting depth priors and the relative merits of different priors have not been thoroughly investigated. Moreover, the relative merits of selecting different approaches to use the depth priors is also an unexplored problem. In this paper, we provide a comprehensive study and evaluation of employing depth priors to outdoor neural radiance fields, covering common depth sensing technologies and most application ways. Specifically, we conduct extensive experiments with two representative NeRF methods equipped with four commonly-used depth priors and different depth usages on two widely used outdoor datasets. Our experimental results reveal several interesting findings that can potentially benefit practitioners and researchers in training their NeRF models with depth priors. Project Page: https://cwchenwang.github.io/outdoor-nerf-depth △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: Accepted to ACM MM 2023. Project Page: https://cwchenwang.github.io/outdoor-nerf-depth

arXiv:2308.02809 [pdf]

3D front tip fields in cree** solids under constraint effects: a higher-order asymptotic solution

Authors: Weichen Kong, Yanwei Dai, Yinghua Liu

Abstract: As one of the most important topics studied in creep fracture mechanics, mechanics fields at three-dimensional (3D) sharp V-notches and crack tip have drawn tremendous attentions. With many years efforts on constraint theory developed in cree** solids, there still seems dense fog on how in-plane and out-of-plane constraint effects are interacted for 3D sharp V-notch and crack in cree** solids.… ▽ More As one of the most important topics studied in creep fracture mechanics, mechanics fields at three-dimensional (3D) sharp V-notches and crack tip have drawn tremendous attentions. With many years efforts on constraint theory developed in cree** solids, there still seems dense fog on how in-plane and out-of-plane constraint effects are interacted for 3D sharp V-notch and crack in cree** solids. To shed lights on this topic, a 3D higher-order termed solution for sharp V-notches in cree** materials subjected to mode 1 loading is established by introducing the out-of-plane factor, which is the out-of-plane stress divided by the sum of in-plane normal stress. The solution can naturally be degenerated to a 3D crack. Based on the 3D higher-order term solution, a new fracture parameter is proposed and combined with to characterize 3D constraint effect. It is found that the stress exponents and angular distribution of higher-order term for 3D notches and cracks are highly related to . The proposed higher order termed solutions show better agreement with the FEA results than the 3D leading-term and 2D two-term solutions, especially for smaller notch angles and ligament width. Moreover, the presented 3D constraint theory shows that effects of and are highly interlinked rather than simply separated. It implies that the 3D constraint level may be significantly influenced by . The 3D mathematical solutions discussed in this paper could enhance the understanding of the 3D effect and has the potential to explain the 3D constraint effect on the notches and cracks under creep conditions. △ Less

Submitted 5 August, 2023; originally announced August 2023.

Comments: 56 pages, 25 figures

arXiv:2308.00041 [pdf, other]

doi 10.3847/1538-4357/aced3e

UV-Bright Star-Forming Clumps and Their Host Galaxies in UVCANDELS at 0.5 $\leq$ z $\leq$ 1

Authors: Alec Martin, Yicheng Guo, Xin Wang, Anton M. Koekemoer, Marc Rafelski, Harry I. Teplitz, Rogier A. Windhorst, Anahita Alavi, Norman A. Grogin, Laura Prichard, Ben Sunnquist, Daniel Ceverino, Nima Chartab, Christopher J. Conselice, Y. Sophia Dai, Avishai Dekel, Johnathan P. Gardner, Eric Gawiser, Nimish P. Hathi, Matthew J. Hayes, Rolf A. Jansen, Zhiyuan Ji, David C. Koo, Ray A. Lucas, Nir Mandelker , et al. (10 additional authors not shown)

Abstract: Giant star-forming clumps are a prominent feature of star-forming galaxies (SFGs) and contain important clues on galaxy formation and evolution. However, basic demographics of clumps and their host galaxies remain uncertain. Using the HST/WFC3 F275W images from the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (UVCANDELS), we detect and analyze giant sta… ▽ More Giant star-forming clumps are a prominent feature of star-forming galaxies (SFGs) and contain important clues on galaxy formation and evolution. However, basic demographics of clumps and their host galaxies remain uncertain. Using the HST/WFC3 F275W images from the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (UVCANDELS), we detect and analyze giant star-forming clumps in galaxies at 0.5 $\leq$ z $\leq$ 1, connecting two epochs when clumps are common (at cosmic high-noon, z $\sim$ 2) and rare (in the local universe). We construct a clump sample whose rest-frame 1600 Å luminosity is 3 times higher than the most luminous local HII regions (M$_{UV} \leq -$16 AB). In our sample, 35 $\pm$ 3$\%$ of low-mass galaxies (log[M$_{*}$/M$_{\odot}$] $<$ 10) are clumpy (i.e., containing at least one off-center clump). This fraction changes to 22 $\pm$ 3$\%$ and 22 $\pm$ 4$\%$ for intermediate (10 $\leq$ log[M$_{*}$/M$_{\odot}$] $\leq$ 10.5) and high-mass (log[M$_{*}$/M$_{\odot}$] $>$ 10.5) galaxies in agreement with previous studies. When compared to similar-mass non-clumpy SFGs, low- and intermediate-mass clumpy SFGs tend to have higher SFRs and bluer rest-frame U-V colors, while high-mass clumpy SFGs tend to be larger than non-clumpy SFGs. However, clumpy and non-clumpy SFGs have similar Sérsic index, indicating a similar underlying density profile. Furthermore, we investigate how UV luminosity of star-forming regions correlates with the physical properties of host galaxies. On average, more luminous star-forming regions reside in more luminous, smaller, and/or higher-specific SFR galaxies and are found closer to their hosts' galactic center. △ Less

Submitted 2 October, 2023; v1 submitted 31 July, 2023; originally announced August 2023.

Comments: 21 pages, 13 figures, accepted for publication in ApJ

Journal ref: ApJ 955 106 (2023)

arXiv:2307.16579 [pdf, other]

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

Authors: Yuxin Mao, **g Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

Abstract: We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio. We interpret AVS as a conditional generation task, where audio is defined as the conditional variable for sound producer(s) segmentation. With our new interpretation, it is especially necessary to model the correlation between audio and the final segme… ▽ More We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio. We interpret AVS as a conditional generation task, where audio is defined as the conditional variable for sound producer(s) segmentation. With our new interpretation, it is especially necessary to model the correlation between audio and the final segmentation map to ensure its contribution. We introduce a latent diffusion model to our framework to achieve semantic-correlated representation learning. Specifically, our diffusion model learns the conditional generation process of the ground-truth segmentation map, leading to ground-truth aware inference when we perform the denoising process at the test stage. As a conditional diffusion model, we argue it is essential to ensure that the conditional variable contributes to model output. We then introduce contrastive learning to our framework to learn audio-visual correspondence, which is proven consistent with maximizing the mutual information between model prediction and the audio data. In this way, our latent diffusion model via contrastive learning explicitly maximizes the contribution of audio for AVS. Experimental results on the benchmark dataset verify the effectiveness of our solution. Code and results are online via our project page: https://github.com/OpenNLPLab/DiffusionAVS. △ Less

Submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.16572 [pdf, other]

Transferable Attack for Semantic Segmentation

Authors: Mengqi He, **g Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai

Abstract: We analysis performance of semantic segmentation models wrt. adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models. i.e The conventional attack methods, such as PGD and FGSM, do not transfer well to target models, making it necessary to study the transferable attacks, especially transferable attacks for semantic segmentation.… ▽ More We analysis performance of semantic segmentation models wrt. adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models. i.e The conventional attack methods, such as PGD and FGSM, do not transfer well to target models, making it necessary to study the transferable attacks, especially transferable attacks for semantic segmentation. We find two main factors to achieve transferable attack. Firstly, the attack should come with effective data augmentation and translation-invariant features to deal with unseen models. Secondly, stabilized optimization strategies are needed to find the optimal attack direction. Based on the above observations, we propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability. The source code and experimental results are publicly available via our project page: https://github.com/anucvers/TASS. △ Less

Submitted 21 August, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Source code is available at: https://github.com/anucvers/TASS

arXiv:2307.16509 [pdf, other]

Digging Into Uncertainty-based Pseudo-label for Robust Stereo Matching

Authors: Zhelun Shen, Xibin Song, Yuchao Dai, Dingfu Zhou, Zhibo Rao, Liangjun Zhang

Abstract: Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose… ▽ More Due to the domain differences and unbalanced disparity distribution across multiple datasets, current stereo matching approaches are commonly limited to a specific dataset and generalize poorly to others. Such domain shift issue is usually addressed by substantial adaptation on costly target-domain ground-truth data, which cannot be easily obtained in practical settings. In this paper, we propose to dig into uncertainty estimation for robust stereo matching. Specifically, to balance the disparity distribution, we employ a pixel-level uncertainty estimation to adaptively adjust the next stage disparity searching space, in this way driving the network progressively prune out the space of unlikely correspondences. Then, to solve the limited ground truth data, an uncertainty-based pseudo-label is proposed to adapt the pre-trained model to the new domain, where pixel-level and area-level uncertainty estimation are proposed to filter out the high-uncertainty pixels of predicted disparity maps and generate sparse while reliable pseudo-labels to align the domain gap. Experimentally, our method shows strong cross-domain, adapt, and joint generalization and obtains \textbf{1st} place on the stereo task of Robust Vision Challenge 2020. Additionally, our uncertainty-based pseudo-labels can be extended to train monocular depth estimation networks in an unsupervised way and even achieves comparable performance with the supervised methods. The code will be available at https://github.com/gallenszl/UCFNet. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted by TPAMI

arXiv:2307.15429 [pdf, other]

Improvable Gap Balancing for Multi-Task Learning

Authors: Yanqi Dai, Nanyi Fei, Zhiwu Lu

Abstract: In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is still worth further exploration in MTL. Note that prior studies typically ignore that there exist varying improvable gaps across multiple tasks, where the… ▽ More In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is still worth further exploration in MTL. Note that prior studies typically ignore that there exist varying improvable gaps across multiple tasks, where the improvable gap per task is defined as the distance between the current training progress and desired final training progress. Therefore, after loss balancing, the performance imbalance still arises in many cases. In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL. Particularly, instead of directly balancing the losses in MTL, both algorithms choose to dynamically assign task weights for improvable gap balancing. Moreover, we combine IGB and gradient balancing to show the complementarity between the two types of algorithms. Extensive experiments on two benchmark datasets demonstrate that our IGB algorithms lead to the best results in MTL via loss balancing and achieve further improvements when combined with gradient balancing. Code is available at https://github.com/YanqiDai/IGB4MTL. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted for the 39th Conference on Uncertainty in Artificial Intelligence (UAI 2023)

arXiv:2307.09929 [pdf, other]

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

Authors: Mochu Xiang, **g Zhang, Nick Barnes, Yuchao Dai

Abstract: Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models. However, the intrinsic ill-posedness and ordinal-sensitive nature of MDE pose major challenges to the estimation of uncertainty degree of the trained models. On the one hand, utilizing current uncertainty modeling methods may increase memory co… ▽ More Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models. However, the intrinsic ill-posedness and ordinal-sensitive nature of MDE pose major challenges to the estimation of uncertainty degree of the trained models. On the one hand, utilizing current uncertainty modeling methods may increase memory consumption and are usually time-consuming. On the other hand, measuring the uncertainty based on model accuracy can also be problematic, where uncertainty reliability and prediction accuracy are not well decoupled. In this paper, we propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions originating from the depth probability volume and its extensions, and to assess it more fairly with more comprehensive metrics. By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability, and can be further improved when combined with ensemble or sampling methods. A series of experiments demonstrate the effectiveness of our methods. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.09270 [pdf, other]

Linearized Relative Positional Encoding

Authors: Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong

Abstract: Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding met… ▽ More Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Reviewed by TMLR, decision pending. Yiran Zhong is the corresponding author. Code is available at https://github.com/OpenNLPLab/Lrpe

arXiv:2307.09225 [pdf]

doi 10.1038/s44287-024-00025-w

Human Body Digital Twin: A Master Plan

Authors: Chenyu Tang, Wentian Yi, Edoardo Occhipinti, Yanning Dai, Shuo Gao, Luigi G. Occhipinti

Abstract: A human body digital twin (DT) is a virtual representation of an individual's physiological state, created using real-time data from sensors and medical test devices, with the purpose of simulating, predicting, and optimizing health outcomes through advanced analytics and simulations. The human body DT has the potential to revolutionize healthcare and wellness, but its responsible and effective im… ▽ More A human body digital twin (DT) is a virtual representation of an individual's physiological state, created using real-time data from sensors and medical test devices, with the purpose of simulating, predicting, and optimizing health outcomes through advanced analytics and simulations. The human body DT has the potential to revolutionize healthcare and wellness, but its responsible and effective implementation requires consideration of various factors. This article presents a comprehensive overview of the current status and future prospects of the human body DT and proposes a five-level roadmap for its development. The roadmap covers the development of various components, such as wearable devices, data collection, data analysis, and decision-making systems. The article also highlights the necessary support, security, cost, and ethical considerations that must be addressed in order to ensure responsible and effective implementation of the human body DT. The proposed roadmap provides a framework for guiding future development and offers a unique perspective on the future of the human body DT, facilitating new interdisciplinary research and innovative solutions in this rapidly evolving field. △ Less

Submitted 12 September, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 3 figures, 2 boxes

arXiv:2307.04651 [pdf, other]

Joint Salient Object Detection and Camouflaged Object Detection via Uncertainty-aware Learning

Authors: Aixuan Li, **g Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai

Abstract: Salient objects attract human attention and usually stand out clearly from their surroundings. In contrast, camouflaged objects share similar colors or textures with the environment. In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient. Due to this inherent contradictory attribute, we introduce an uncertainty-aware learning pipeline to extens… ▽ More Salient objects attract human attention and usually stand out clearly from their surroundings. In contrast, camouflaged objects share similar colors or textures with the environment. In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient. Due to this inherent contradictory attribute, we introduce an uncertainty-aware learning pipeline to extensively explore the contradictory information of salient object detection (SOD) and camouflaged object detection (COD) via data-level and task-wise contradiction modeling. We first exploit the dataset correlation of these two tasks and claim that the easy samples in the COD dataset can serve as hard samples for SOD to improve the robustness of the SOD model. Based on the assumption that these two models should lead to activation maps highlighting different regions of the same input image, we further introduce a contrastive module with a joint-task contrastive learning framework to explicitly model the contradictory attributes of these two tasks. Different from conventional intra-task contrastive learning for unsupervised representation learning, our contrastive module is designed to model the task-wise correlation, leading to cross-task representation learning. To better understand the two tasks from the perspective of uncertainty, we extensively investigate the uncertainty estimation techniques for modeling the main uncertainties of the two tasks, namely task uncertainty (for SOD) and data uncertainty (for COD), and aiming to effectively estimate the challenging regions for each task to achieve difficulty-aware learning. Experimental results on benchmark datasets demonstrate that our solution leads to both state-of-the-art performance and informative uncertainty estimation. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2307.03376 [pdf, other]

Weakly-supervised Contrastive Learning for Unsupervised Object Discovery

Authors: Yunqiu Lv, **g Zhang, Nick Barnes, Yuchao Dai

Abstract: Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorise existing techniques into two main… ▽ More Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorise existing techniques into two main directions, namely the generative solutions based on image resynthesis, and the clustering methods based on self-supervised models. We have observed that the former heavily relies on the quality of image reconstruction, while the latter shows limitations in effectively modeling semantic correlations. To directly target at object discovery, we focus on the latter approach and propose a novel solution by incorporating weakly-supervised contrastive learning (WCL) to enhance semantic information exploration. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images, which is achieved by fine-tuning the feature encoder of a self-supervised model, namely DINO, via WCL. Subsequently, we introduce Principal Component Analysis (PCA) to localize object regions. The principal projection direction, corresponding to the maximal eigenvalue, serves as an indicator of the object region(s). Extensive experiments on benchmark unsupervised object discovery datasets demonstrate the effectiveness of our proposed solution. The source code and experimental results are publicly available via our project page at https://github.com/npucvr/WSCUOD.git. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2307.02950 [pdf, ps, other]

Electronic correlations and partial gap in the bilayer nickelate La$_{3}$Ni$_{2}$O$_{7}$

Authors: Zhe Liu, Mengwu Huo, Jie Li, Qing Li, Yuecong Liu, Yaomin Dai, Xiaoxiang Zhou, Jiahao Hao, Yi Lu, Meng Wang, Hai-Hu Wen

Abstract: The discovery of superconductivity with a critical temperature of about 80~K in La$_{3}$Ni$_{2}$O$_{7}$ single crystals under pressure has received enormous attention. La$_{3}$Ni$_{2}$O$_{7}$ is not superconducting under ambient pressure but exhibits a transition at $T^{\ast} \simeq 115$~K. Understanding the electronic correlations and charge dynamics is an important step towards the origin of sup… ▽ More The discovery of superconductivity with a critical temperature of about 80~K in La$_{3}$Ni$_{2}$O$_{7}$ single crystals under pressure has received enormous attention. La$_{3}$Ni$_{2}$O$_{7}$ is not superconducting under ambient pressure but exhibits a transition at $T^{\ast} \simeq 115$~K. Understanding the electronic correlations and charge dynamics is an important step towards the origin of superconductivity and other instabilities. Here, our optical study shows that La$_{3}$Ni$_{2}$O$_{7}$ features strong electronic correlations which significantly reduce the electron's kinetic energy and place this system in the proximity of the Mott phase. The low-frequency optical conductivity reveals two Drude components arising from multiple bands at the Fermi level. The transition at $T^{\ast}$ removes the Drude component exhibiting non-Fermi liquid behavior, whereas the one with Fermi-liquid behavior is barely affected. These observations in combination with theoretical results suggest that the Fermi surface dominated by the Ni-$d_{3z^{2}-r^{2}}$ orbital is removed due to the transition at $T^{\ast}$. Our experimental results provide pivotal information for understanding the transition at $T^{\ast}$ and superconductivity in La$_{3}$Ni$_{2}$O$_{7}$. △ Less

Submitted 2 April, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 26 pages, 4 figures, Comments are welcome and appreciated

arXiv:2307.02335 [pdf, other]

The Classification of Galaxy Morphology in H-band of COSMOS-DASH Field: a combination-based machine learning clustering model

Authors: Yao Dai, Jun Xu, Jie Song, Guanwen Fang, Chichun Zhou, Shuo Ba, Yizhou Gu, Zesen Lin, Xu Kong

Abstract: By applying our previously developed two-step scheme for galaxy morphology classification, we present a catalog of galaxy morphology for H-band selected massive galaxies in the COSMOS-DASH field, which includes 17292 galaxies with stellar mass $M_{\star}>10^{10}~M_{\odot}$ at $0.5<z<2.5$. The classification scheme is designed to provide a complete morphology classification for galaxies via a combi… ▽ More By applying our previously developed two-step scheme for galaxy morphology classification, we present a catalog of galaxy morphology for H-band selected massive galaxies in the COSMOS-DASH field, which includes 17292 galaxies with stellar mass $M_{\star}>10^{10}~M_{\odot}$ at $0.5<z<2.5$. The classification scheme is designed to provide a complete morphology classification for galaxies via a combination of two machine-learning steps. We first use an unsupervised machine learning method (i.e., bagging-based multi-clustering) to cluster galaxies into five categories: spherical (SPH), early-type disk (ETD), late-type disk (LTD), irregular (IRR), and unclassified (UNC). About 48\% of galaxies (8258/17292) are successfully clustered during this step. For the remaining sample, we adopt a supervised machine learning method (i.e., GoogLeNet) to classify them, during which galaxies that are well-classified in the previous step are taken as our training set. Consequently, we obtain a morphology classification result for the full sample. The t-SNE test shows that galaxies in our sample can be well aggregated. We also measure the parametric and nonparametric morphologies of these galaxies. We find that the Sérsic index increases from IRR to SPH and the effective radius decreases from IRR to SPH, consistent with the corresponding definitions. Galaxies from different categories are separately distributed in the $G$--$M_{20}$ space. Such consistencies with other characteristic descriptions of galaxy morphology demonstrate the reliability of our classification result, ensuring that it can be used as a basic catalog for further galaxy studies. △ Less

Submitted 6 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 13 pages, 10 figures, accepted by ApJS

arXiv:2306.16176 [pdf, other]

SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills

Authors: Zhangyin Feng, Yong Dai, Fan Zhang, Duyu Tang, Xiaocheng Feng, Shuangzhi Wu, Bing Qin, Yunbo Cao, Shuming Shi

Abstract: Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-spe… ▽ More Traditional multitask learning methods basically can only exploit common knowledge in task- or language-wise, which lose either cross-language or cross-task knowledge. This paper proposes a general multilingual multitask model, named SkillNet-X, which enables a single model to tackle many different tasks from different languages. To this end, we define several language-specific skills and task-specific skills, each of which corresponds to a skill module. SkillNet-X sparsely activates parts of the skill modules which are relevant either to the target task or the target language. Acting as knowledge transit hubs, skill modules are capable of absorbing task-related knowledge and language-related knowledge consecutively. Based on Transformer, we modify the multi-head attention layer and the feed forward network layer to accommodate skill modules. We evaluate SkillNet-X on eleven natural language understanding datasets in four languages. Results show that SkillNet-X performs better than task-specific baselines and two multitask learning baselines (i.e., dense joint model and Mixture-of-Experts model). Furthermore, skill pre-training further improves the performance of SkillNet-X on almost all datasets. To investigate the generalization of our model, we conduct experiments on two new tasks and find that SkillNet-X significantly outperforms baselines. △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.15247 [pdf, ps, other]

Towards Efficient Optimal Large-Scale Network Slicing: A Decomposition Approach

Authors: Wei-Kun Chen, Zheyu Wu, Rui-** Zhang, Ya-Feng Liu, Yu-Hong Dai, Zhi-Quan Luo

Abstract: This paper considers the network slicing (NS) problem which attempts to map multiple customized virtual network requests to a common shared network infrastructure and allocate network resources to meet diverse service requirements. This paper proposes an efficient decomposition algorithm for globally solving the large-scale NP-hard NS problem. The proposed algorithm decomposes the hard NS problem… ▽ More This paper considers the network slicing (NS) problem which attempts to map multiple customized virtual network requests to a common shared network infrastructure and allocate network resources to meet diverse service requirements. This paper proposes an efficient decomposition algorithm for globally solving the large-scale NP-hard NS problem. The proposed algorithm decomposes the hard NS problem into two relatively easy function placement (FP) and traffic routing (TR) subproblems and iteratively solves them enabling the information feedback between each other, which makes it particularly suitable to solve large-scale problems. Specifically, the FP subproblem is to place service functions into cloud nodes in the network, and solving it can return a function placement strategy based on which the TR subproblem is defined; and the TR subproblem is to find paths connecting two nodes hosting two adjacent functions in the network, and solving it can either verify that the solution of the FP subproblem is an optimal solution of the original problem, or return a valid inequality to the FP subproblem that cuts off the current infeasible solution. The proposed algorithm is guaranteed to find the global solution of the NS problem. By taking the special structure of the NS problem into consideration, we successfully develop two families of valid inequalities that render the proposed decomposition algorithm converge much more quickly and thus much more efficient. We demonstrate the effectiveness and efficiency of the proposed valid inequalities and algorithm via numerical experiments. △ Less

Submitted 14 December, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: 13 pages, 11 figures, submitted for possible publication; for the conference version, see arXiv:2306.15247v1

arXiv:2306.15032 [pdf, other]

DMseg: a Python algorithm for de novo detection of differentially or variably methylated regions

Authors: Xiaoyu Wang, Ming Yu, William Grady, Ziding Feng, Wei Sun, James Y Dai

Abstract: Detecting and assessing statistical significance of differentially methylated regions (DMRs) is a fundamental task in methylome association studies. While the average differential methylation in different phenotype groups has been the inferential focus, methylation changes in chromosomal regions may also present as differential variability, i.e., variably methylated regions (VMRs). Testing statist… ▽ More Detecting and assessing statistical significance of differentially methylated regions (DMRs) is a fundamental task in methylome association studies. While the average differential methylation in different phenotype groups has been the inferential focus, methylation changes in chromosomal regions may also present as differential variability, i.e., variably methylated regions (VMRs). Testing statistical significance of regional differential methylation is a challenging problem, and existing algorithms do not provide accurate type I error control for genome-wide DMR or VMR analysis. No algorithm has been publicly available for detecting VMRs. We propose DMseg, a Python algorithm with efficient DMR/VMR detection and significance assessment for array-based methylome data, and compare its performance to Bumphunter, a popular existing algorithm. Operationally, DMseg searches for DMRs or VMRs within CpG clusters that are adaptively determined by both gap distance and correlation between contiguous CpG sites in a microarray. Levene test was implemented for assessing differential variability of individual CpGs. A likelihood ratio statistic is proposed to test for a constant difference within CpGs in a DMR or VMR to summarize the evidence of regional difference. Using a stratified permutation scheme and pooling null distributions of LRTs from clusters with similar numbers of CpGs, DMseg provides accurate control of the type I error rate. In simulation experiments, DMseg shows superior power than Bumphunter to detect DMRs. Application to methylome data of Barrett's esophagus and esophageal adenocarcinoma reveals a number of DMRs and VMRs of biological interest. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.14826 [pdf, other]

Incorporating increased variability in testing for cancer DNA methylation

Authors: James Y. Dai, Heng Chen, Xiaoyu Wang, Wei Sun, Ying Huang, William M. Grady, Ziding Feng

Abstract: Cancer development is associated with aberrant DNA methylation, including increased stochastic variability. Statistical tests for discovering cancer methylation biomarkers have focused on changes in mean methylation. To improve the power of detection, we propose to incorporate increased variability in testing for cancer differential methylation by two joint constrained tests: one for differential… ▽ More Cancer development is associated with aberrant DNA methylation, including increased stochastic variability. Statistical tests for discovering cancer methylation biomarkers have focused on changes in mean methylation. To improve the power of detection, we propose to incorporate increased variability in testing for cancer differential methylation by two joint constrained tests: one for differential mean and increased variance, the other for increased mean and increased variance. To improve small sample properties, likelihood ratio statistics are developed, accounting for the variability in estimating the sample medians in the Levene test. Efficient algorithms were developed and implemented in DMVC function of R package DMtest. The proposed joint constrained tests were compared to standard tests and partial area under the curve (pAUC) for the receiver operating characteristic curve (ROC) in simulated datasets under diverse models. Application to the high-throughput methylome data in The Cancer Genome Atlas (TCGA) shows substantially increased yield of candidate CpG markers. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.11231 [pdf, other]

Deep HI Map** of Stephan's Quintet and Its Neighborhood

Authors: Cheng Cheng, Cong Kevin Xu, P. N. Appleton, P. -A. Duc, N. -Y. Tang, Y. S. Dai, J. -S. Huang, U. Lisenfeld, F. Renaud, Chuan He, Hai-Cheng Feng

Abstract: We carried out deep map** observations of the atomic hydrogen (HI) 21 cm line emission in a field centered on the famous galaxy group Stephan's Quintet (SQ), using the Five-hundred-meter Aperture Spherical Telescope (FAST) equipped with the 19-Beam Receiver. The final data cube reaches an HI column density sensitivity of $5 σ= 2.1\times 10^{17}$ cm$^{-2}$ per 20 km s$^{-1}$ channel with an angul… ▽ More We carried out deep map** observations of the atomic hydrogen (HI) 21 cm line emission in a field centered on the famous galaxy group Stephan's Quintet (SQ), using the Five-hundred-meter Aperture Spherical Telescope (FAST) equipped with the 19-Beam Receiver. The final data cube reaches an HI column density sensitivity of $5 σ= 2.1\times 10^{17}$ cm$^{-2}$ per 20 km s$^{-1}$ channel with an angular resolution of $4'.0$. The discovery of a large diffuse feature of the HI emission in the outskirt of the intragroup medium of SQ was reported in a previous paper (Xu et al. 2022). Here we present a new study of the total HI emission of SQ and the detection of several neighboring galaxies, exploiting the high sensitivity and the large sky coverage of the FAST observations. A total HI mass of $M_{\rm HI} = 3.48 \pm 0.35 \times 10^{10}\; M_\odot$ is found for SQ, which is significantly higher than previous measurements in the literature. This indicates that, contrary to earlier claims, SQ is not HI deficient. The excessive HI gas is mainly found in the velocity ranges of 6200 - 6400 km s$^{-1}$ and 6800 - 7000 km s$^{-1}$, which was undetected in previous observations that are less sensitive than ours. Our results suggest that the ``missing HI" in compact groups may be hidden in the low-density diffuse neutral gas instead of in the ionized gas. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 20 pages, 5 figures, Accepted by ApJ

arXiv:2306.06877 [pdf, other]

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

Authors: AnLan Sun, Zhao Zhang, Meng Lei, Yuting Dai, Dong Wang, Liwei Wang

Abstract: Breast ultrasound videos contain richer information than ultrasound images, therefore it is more meaningful to develop video models for this diagnosis task. However, the collection of ultrasound video datasets is much harder. In this paper, we explore the feasibility of enhancing the performance of ultrasound video classification using the static image dataset. To this end, we propose KGA-Net and… ▽ More Breast ultrasound videos contain richer information than ultrasound images, therefore it is more meaningful to develop video models for this diagnosis task. However, the collection of ultrasound video datasets is much harder. In this paper, we explore the feasibility of enhancing the performance of ultrasound video classification using the static image dataset. To this end, we propose KGA-Net and coherence loss. The KGA-Net adopts both video clips and static images to train the network. The coherence loss uses the feature centers generated by the static images to guide the frame attention in the video model. Our KGA-Net boosts the performance on the public BUSV dataset by a large margin. The visualization results of frame attention prove the explainability of our method. The codes and model weights of our method will be made publicly available. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: Medical Image Computing and Computer-Assisted Intervention 2023

arXiv:2306.04236 [pdf, other]

Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond

Authors: Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yihang Luo, Chen Change Loy

Abstract: Artificial lights commonly leave strong lens flare artifacts on the images captured at night, degrading both the visual quality and performance of vision algorithms. Existing flare removal approaches mainly focus on removing daytime flares and fail in nighttime cases. Nighttime flare removal is challenging due to the unique luminance and spectrum of artificial lights, as well as the diverse patter… ▽ More Artificial lights commonly leave strong lens flare artifacts on the images captured at night, degrading both the visual quality and performance of vision algorithms. Existing flare removal approaches mainly focus on removing daytime flares and fail in nighttime cases. Nighttime flare removal is challenging due to the unique luminance and spectrum of artificial lights, as well as the diverse patterns and image degradation of the flares. The scarcity of the nighttime flare removal dataset constraints the research on this crucial task. In this paper, we introduce Flare7K++, the first comprehensive nighttime flare removal dataset, consisting of 962 real-captured flare images (Flare-R) and 7,000 synthetic flares (Flare7K). Compared to Flare7K, Flare7K++ is particularly effective in eliminating complicated degradation around the light source, which is intractable by using synthetic flares alone. Besides, the previous flare removal pipeline relies on the manual threshold and blur kernel settings to extract light sources, which may fail when the light sources are tiny or not overexposed. To address this issue, we additionally provide the annotations of light sources in Flare7K++ and propose a new end-to-end pipeline to preserve the light source while removing lens flares. Our dataset and pipeline offer a valuable foundation and benchmark for future investigations into nighttime flare removal studies. Extensive experiments demonstrate that Flare7K++ supplements the diversity of existing flare datasets and pushes the frontier of nighttime flare removal towards real-world scenarios. △ Less

Submitted 7 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: Extension of arXiv:2210.06570; Project page at https://ykdai.github.io/projects/Flare7K

arXiv:2306.03630 [pdf, other]

Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection

Authors: Aixuan Li, Yuxin Mao, **g Zhang, Yuchao Dai

Abstract: In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutua… ▽ More In this paper, we present a weakly-supervised RGB-D salient object detection model via scribble supervision. Specifically, as a multimodal learning task, we focus on effective multimodal representation learning via inter-modal mutual information regularization. In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection. Based on our multimodal representation learning framework, we introduce an asymmetric feature extractor for our multimodal data, which is proven more effective than the conventional symmetric backbone setting. We also introduce multimodal variational auto-encoder as stochastic prediction refinement techniques, which takes pseudo labels from the first training stage as supervision and generates refined prediction. Experimental results on benchmark RGB-D salient object detection datasets verify both effectiveness of our explicit multimodal disentangled representation learning method and the stochastic prediction refinement strategy, achieving comparable performance with the state-of-the-art fully supervised models. Our code and data are available at: https://github.com/baneitixiaomai/MIRV. △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: IEEE Transactions on Circuits and Systems for Video Technology 2023

arXiv:2306.02610 [pdf, other]

doi 10.3847/1538-3881/acff67

Understanding the Planetary Formation and Evolution in Star Clusters(UPiC)-I: Evidence of Hot Giant Exoplanets Formation Timescales

Authors: Yuan-Zhe Dai, Hui-Gen Liu, Jia-Yi Yang, Ji-Lin Zhou

Abstract: Planets in young star clusters could shed light on planet formation and evolution since star clusters can provide accurate age estimation. However, the number of transiting planets detected in clusters was only $\sim 30$, too small for statistical analysis. Thanks to the unprecedented high-precision astrometric data provided by Gaia DR2 and Gaia DR3, many new Open Clusters(OCs) and comoving groups… ▽ More Planets in young star clusters could shed light on planet formation and evolution since star clusters can provide accurate age estimation. However, the number of transiting planets detected in clusters was only $\sim 30$, too small for statistical analysis. Thanks to the unprecedented high-precision astrometric data provided by Gaia DR2 and Gaia DR3, many new Open Clusters(OCs) and comoving groups have been identified. The UPiC project aims to find observational evidence and interpret how planet form and evolve in cluster environments. In this work, we cross-match the stellar catalogs of new OCs and comoving groups with confirmed planets and candidates. We carefully remove false positives and obtain the biggest catalog of planets in star clusters up to now, which consists of 73 confirmed planets and 84 planet candidates. After age validation, we obtain the radius--age diagram of these planets/candidates. We find an increment of the fraction of Hot Jupiters(HJs) around 100 Myr and attribute the increment to the flyby-induced high-e migration in star clusters. An additional small bump of the fraction of HJs after 1 Gyr is detected, which indicates the formation timescale of HJ around field stars is much larger than that in star clusters. Thus, stellar environments play important roles in the formation of HJs. The hot-Neptune desert occurs around 100 Myr in our sample. A combination of photoevaporation and high-e migration may sculpt the hot-Neptune desert in clusters. △ Less

Submitted 6 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 22 pages, 11 figures, 2 tables, accepted for publication in AJ

Journal ref: The Astronomical Journal, Year 2023, Volume 166, Number 6

arXiv:2305.15287 [pdf, other]

The Crucial Role of Normalization in Sharpness-Aware Minimization

Authors: Yan Dai, Kwangjun Ahn, Suvrit Sra

Abstract: Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically an… ▽ More Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its empirical success. We focus, in particular, on understanding the role played by normalization, a key component of the SAM updates. We theoretically and empirically study the effect of normalization in SAM for both convex and non-convex functions, revealing two key roles played by normalization: i) it helps in stabilizing the algorithm; and ii) it enables the algorithm to drift along a continuum (manifold) of minima -- a property identified by recent theoretical works that is the key to better performance. We further argue that these two properties of normalization make SAM robust against the choice of hyper-parameters, supporting the practicality of SAM. Our conclusions are backed by various experiments. △ Less

Submitted 23 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: 30 pages, Published in 37th Neural Information Processing Systems (NeurIPS 2023)

arXiv:2305.14980 [pdf, other]

doi 10.1051/0004-6361/202345914

An Atypical Plateau-like Extreme-ultraviolet Late-phase Solar Flare Driven by the Non-radial Eruption of a Magnetic Flux Rope

Authors: Yuehong Chen, Yu Dai, Mingde Ding

Abstract: Recent observations in extreme-ultraviolet (EUV) wavelengths reveal an EUV late phase in some solar flares, which is characterized by a second peak in the warm coronal emissions (about 3 MK) occurring several tens of minutes to a few hours after the corresponding main flare peak. We aim to clarify the physical origin of an atypical plateau-like EUV late phase in an X1.8-class solar flare occurring… ▽ More Recent observations in extreme-ultraviolet (EUV) wavelengths reveal an EUV late phase in some solar flares, which is characterized by a second peak in the warm coronal emissions (about 3 MK) occurring several tens of minutes to a few hours after the corresponding main flare peak. We aim to clarify the physical origin of an atypical plateau-like EUV late phase in an X1.8-class solar flare occurring on 2011 September 7 from active region (AR) 11283. We first characterize the plateau-like late phase using EUV Variability Experiment (EVE) full-disk integrated irradiance observations and Atmospheric Imaging Assembly (AIA) spatially-resolved imaging observations on board the Solar Dynamics Observatory (SDO). Then we perform a nonlinear force-free-field (NLFFF) extrapolation, from which a filament-hosting magnetic flux rope (MFR) is revealed. The eruption of the MFR is tracked both in the plane of the sky (POS) and along the line of sight (LOS) through visual inspection and spectral fitting, respectively. Finally, we carry out differential emission measure (DEM) analysis to explore the thermodynamics of the late-phase loops. The MFR shows a non-radial eruption from a fan-spine magnetic structure. The eruption of the MFR and its interaction with overlying arcades invoke multiple magnetic reconnections, which are responsible for the production of different groups of late-phase loops. Afterwards, the late-phase loops enter a long-lasting cooling stage, appearing sequentially in AIA passbands of decreasing response temperatures. Due to their different lengths, the different groups of late-phase loops cool down at different cooling rates, which makes their warm coronal emission peaks temporally separated from each other. Combing the emissions from all late-phase loops together, an elongated plateau-like late phase is formed. △ Less

Submitted 20 June, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by A&A

Journal ref: A&A 675, A147 (2023)

arXiv:2305.14895 [pdf, other]

doi 10.1088/1674-4527/acd593

The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite

Authors: Z. X. Ling, X. J. Sun, C. Zhang, S. L. Sun, G. **, S. N. Zhang, X. F. Zhang, J. B. Chang, F. S. Chen, Y. F. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, Z. D. Li, P. R. Liu, Y. H. Lv, X. H. Ma, Y. J. Tang, C. B. Wang, R. J. Xie, Y. L. Xue, A. L. Yan , et al. (101 additional authors not shown)

Abstract: The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo… ▽ More The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by RAA

arXiv:2305.13770 [pdf, other]

MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Authors: Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qingpeng Zhu, Qianhui Sun, Wenxiu Sun, Chen Change Loy, **wei Gu

Abstract: Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More Develo** and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). With the success of the 1st MIPI Workshop@ECCV 2022, we introduce the second MIPI challenge including four tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2023. In total, 120 participants were successfully registered, and 11 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. A detailed description of all models developed in this challenge is provided in this paper. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2023/ . △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2023/

arXiv:2305.13413 [pdf, other]

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

Authors: Yuqian Dai, Serge Sharoff, Marc de Kamps

Abstract: Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph Attention Network (GAT) and BERT jointly represent syntactic dependency feature as explicit… ▽ More Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph Attention Network (GAT) and BERT jointly represent syntactic dependency feature as explicit knowledge of the source language to enrich source language representations and guide target language generation. Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement regarding syntactic knowledge without being limited to a BLEU score. Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores. We investigate what length of source sentences benefits the most and what dependencies are better identified by the SGB engines. We also find that learning of specific dependency relations by GAT can be reflected in the translation quality containing such relations and that syntax on the graph leads to new modeling of syntactic aspects of source sentences in the middle and bottom layers of BERT. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.13403 [pdf, other]

GATology for Linguistics: What Syntactic Dependencies It Knows

Authors: Yuqian Dai, Serge Sharoff, Marc de Kamps

Abstract: Graph Attention Network (GAT) is a graph neural network which is one of the strategies for modeling and representing explicit syntactic knowledge and can work with pre-trained models, such as BERT, in downstream tasks. Currently, there is still a lack of investigation into how GAT learns syntactic knowledge from the perspective of model structure. As one of the strategies for modeling explicit syn… ▽ More Graph Attention Network (GAT) is a graph neural network which is one of the strategies for modeling and representing explicit syntactic knowledge and can work with pre-trained models, such as BERT, in downstream tasks. Currently, there is still a lack of investigation into how GAT learns syntactic knowledge from the perspective of model structure. As one of the strategies for modeling explicit syntactic knowledge, GAT and BERT have never been applied and discussed in Machine Translation (MT) scenarios. We design a dependency relation prediction task to study how GAT learns syntactic knowledge of three languages as a function of the number of attention heads and layers. We also use a paired t-test and F1-score to clarify the differences in syntactic dependency prediction between GAT and BERT fine-tuned by the MT task (MT-B). The experiments show that better performance can be achieved by appropriately increasing the number of attention heads with two GAT layers. With more than two layers, learning suffers. Moreover, GAT is more competitive in training speed and syntactic dependency prediction than MT-B, which may reveal a better incorporation of modeling explicit syntactic knowledge and the possibility of combining GAT and BERT in the MT tasks. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.13040 [pdf, other]

SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents

Authors: Shuzheng Si, Wentao Ma, Haoyu Gao, Yuchuan Wu, Ting-En Lin, Yinpei Dai, Hangyu Li, Rui Yan, Fei Huang, Yongbin Li

Abstract: Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challeng… ▽ More Task-oriented dialogue (TOD) models have made significant progress in recent years. However, previous studies primarily focus on datasets written by annotators, which has resulted in a gap between academic research and real-world spoken conversation scenarios. While several small-scale spoken TOD datasets are proposed to address robustness issues such as ASR errors, they ignore the unique challenges in spoken conversation. To tackle the limitations, we introduce SpokenWOZ, a large-scale speech-text dataset for spoken TOD, containing 8 domains, 203k turns, 5.7k dialogues and 249 hours of audios from human-to-human spoken conversations. SpokenWOZ further incorporates common spoken characteristics such as word-by-word processing and reasoning in spoken language. Based on these characteristics, we present cross-turn slot and reasoning slot detection as new challenges. We conduct experiments on various baselines, including text-modal models, newly proposed dual-modal models, and LLMs, e.g., ChatGPT. The results show that the current models still have substantial room for improvement in spoken conversation, where the most advanced dialogue state tracker only achieves 25.65% in joint goal accuracy and the SOTA end-to-end model only correctly completes the user request in 52.1% of dialogues. The dataset, code, and leaderboard are available: https://spokenwoz.github.io/. △ Less

Submitted 12 March, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.09021 [pdf, other]

doi 10.3847/1538-4357/acd5d6

Fraction of Clumpy Star-Forming Galaxies at $0.5\leq z\leq 3$ in UVCANDELS: Dependence on Stellar Mass and Environment

Authors: Zahra Sattari, Bahram Mobasher, Nima Chartab, Daniel D. Kelson, Harry I. Teplitz, Marc Rafelski, Norman A. Grogin, Anton M. Koekemoer, Xin Wang, Rogier A. Windhorst, Anahita Alavi, Laura Prichard, Ben Sunnquist, Jonathan P. Gardner, Eric Gawiser, Nimish P. Hathi, Matthew J. Hayes, Zhiyuan Ji, Vihang Mehta, Brant E. Robertson, Claudia Scarlata, L. Y. Aaron Yung, Christopher J. Conselice, Y. Sophia Dai, Yicheng Guo , et al. (3 additional authors not shown)

Abstract: High-resolution imaging of galaxies in rest-frame UV has revealed the existence of giant star-forming clumps prevalent in high redshift galaxies. Studying these sub-structures provides important information about their formation and evolution and informs theoretical galaxy evolution models. We present a new method to identify clumps in galaxies' high-resolution rest-frame UV images. Using imaging… ▽ More High-resolution imaging of galaxies in rest-frame UV has revealed the existence of giant star-forming clumps prevalent in high redshift galaxies. Studying these sub-structures provides important information about their formation and evolution and informs theoretical galaxy evolution models. We present a new method to identify clumps in galaxies' high-resolution rest-frame UV images. Using imaging data from CANDELS and UVCANDELS, we identify star-forming clumps in an HST/F160W$\leq 25$ AB mag sample of 6767 galaxies at $0.5\leq z\leq 3$ in four fields, GOODS-N, GOODS-S, EGS, and COSMOS. We use a low-pass band filter in Fourier space to reconstruct the background image of a galaxy and detect small-scale features (clumps) on the background-subtracted image. Clumpy galaxies are defined as those having at least one off-center clump that contributes a minimum of 10$\%$ of the galaxy's total rest-frame UV flux. We measure the fraction of clumpy galaxies ($\rm f_{clumpy}$) as a function of stellar mass, redshift, and galaxy environment. Our results indicate that $\rm f_{clumpy}$ increases with redshift, reaching $\sim 65\%$ at $z\sim 1.5$. We also find that $\rm f_{clumpy}$ in low-mass galaxies ($\rm 9.5\leq log(M_*/M_\odot)\leq 10$) is 10$\%$ higher compared to that of their high-mass counterparts ($\rm log(M_*/M_\odot)>10.5$). Moreover, we find no evidence of significant environmental dependence of $\rm f_{clumpy}$ for galaxies at the redshift range of this study. Our results suggest that the fragmentation of gas clouds under violent disk instability remains the primary driving mechanism for clump formation, and incidents common in dense environments, such as mergers, are not the dominant processes. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 16 pages, 11 figures, 2 tables, accepted for publication in ApJ

arXiv:2305.06557 [pdf, other]

Long-Tailed Question Answering in an Open World

Authors: Yi Dai, Hao Lang, Yinhe Zheng, Fei Huang, Yongbin Li

Abstract: Real-world data often have an open long-tailed distribution, and building a unified QA model supporting various tasks is vital for practical QA applications. However, it is non-trivial to extend previous QA approaches since they either require access to seen tasks of adequate samples or do not explicitly model samples from unseen tasks. In this paper, we define Open Long-Tailed QA (OLTQA) as learn… ▽ More Real-world data often have an open long-tailed distribution, and building a unified QA model supporting various tasks is vital for practical QA applications. However, it is non-trivial to extend previous QA approaches since they either require access to seen tasks of adequate samples or do not explicitly model samples from unseen tasks. In this paper, we define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data and optimizing performance over seen and unseen QA tasks. We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks, and explicitly mines knowledge from a large pre-trained language model (LM). Specifically, we organize our model through a pool of fine-grained components and dynamically combine these components for an input to facilitate knowledge sharing. A retrieve-then-rerank frame is further introduced to select in-context examples, which guild the LM to generate text that express knowledge for QA tasks. Moreover, a two-stage training approach is introduced to pre-train the framework by knowledge distillation (KD) from the LM and then jointly train the frame and a QA model through an adaptive mutual KD method. On a large-scale OLTQA dataset we curate from 43 existing QA datasets, our model consistently outperforms the state-of-the-art. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/oltqa}. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: ACL2023 Main Track Long Paper

arXiv:2305.06555 [pdf, other]

Domain Incremental Lifelong Learning in an Open World

Authors: Yi Dai, Hao Lang, Yinhe Zheng, Bowen Yu, Fei Huang, Yongbin Li

Abstract: Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In thi… ▽ More Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: ACL2023 Findings Long Paper. arXiv admin note: substantial text overlap with arXiv:2208.14602

arXiv:2305.06103 [pdf, ps, other]

doi 10.1007/s11433-023-2222-3

Pressure-induced color change arising from transformation between intra- and inter-band transitions in LuH$_{2\pm x}$N$_{y}$

Authors: Zhe Liu, Yingjie Zhang, Shenyang Huang, Xue Ming, Qing Li, Chenghao Pan, Yaomin Dai, Xiaoxiang Zhou, Xiyu Zhu, Hugen Yan, Hai-Hu Wen

Abstract: The pressure-induced color change in the nitrogen-doped lutetium hydride has triggered extensive discussions about the underlying physics. Here, we study the optical response of LuH$_{2 \pm x}$N$_{y}$ in a broad frequency range at ambient pressure and its evolution with pressure in the visible spectral range. The broad-band optical spectra at ambient pressure reveal a Drude component associated wi… ▽ More The pressure-induced color change in the nitrogen-doped lutetium hydride has triggered extensive discussions about the underlying physics. Here, we study the optical response of LuH$_{2 \pm x}$N$_{y}$ in a broad frequency range at ambient pressure and its evolution with pressure in the visible spectral range. The broad-band optical spectra at ambient pressure reveal a Drude component associated with intra-band electronic transitions and two Lorentz components (L1 and L2) arising from inter-band electronic transitions. The application of pressure causes a spectral weight transfer from L1 to the Drude component, leading to a blue shift of the plasma edge in the reflectivity spectrum alongside a reduction of the high-frequency reflectivity. Our results suggest that the pressure-induced color change in LuH$_{2 \pm x}$N$_{y}$ is closely related to the transformation between intra- and inter-band electronic transitions, providing new insights into the mechanism of the pressure-induced color change in LuH$_{2 \pm x}$N$_{y}$. △ Less

Submitted 30 January, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 20 pages, 4 figures. Comments are welcome and appreciated

Journal ref: Sci. China Phys. Mech. Astron. 67, 227411 (2024)

Showing 151–200 of 861 results for author: Dai, Y