Search | arXiv e-print repository

Cross-Field Transformer for Diabetic Retinopathy Grading on Two-field Fundus Images

Authors: Junlin Hou, Jilan Xu, Fan Xiao, Rui-Wei Zhao, Yuejie Zhang, Haidong Zou, Lina Lu, Wenwen Xue, Rui Feng

Abstract: Automatic diabetic retinopathy (DR) grading based on fundus photography has been widely explored to benefit the routine screening and early treatment. Existing researches generally focus on single-field fundus images, which have limited field of view for precise eye examinations. In clinical applications, ophthalmologists adopt two-field fundus photography as the dominating tool, where the informa… ▽ More Automatic diabetic retinopathy (DR) grading based on fundus photography has been widely explored to benefit the routine screening and early treatment. Existing researches generally focus on single-field fundus images, which have limited field of view for precise eye examinations. In clinical applications, ophthalmologists adopt two-field fundus photography as the dominating tool, where the information from each field (i.e.,macula-centric and optic disc-centric) is highly correlated and complementary, and benefits comprehensive decisions. However, automatic DR grading based on two-field fundus photography remains a challenging task due to the lack of publicly available datasets and effective fusion strategies. In this work, we first construct a new benchmark dataset (DRTiD) for DR grading, consisting of 3,100 two-field fundus images. To the best of our knowledge, it is the largest public DR dataset with diverse and high-quality two-field images. Then, we propose a novel DR grading approach, namely Cross-Field Transformer (CrossFiT), to capture the correspondence between two fields as well as the long-range spatial correlations within each field. Considering the inherent two-field geometric constraints, we particularly define aligned position embeddings to preserve relative consistent position in fundus. Besides, we perform masked cross-field attention during interaction to flter the noisy relations between fields. Extensive experiments on our DRTiD dataset and a public DeepDRiD dataset demonstrate the effectiveness of our CrossFiT network. The new dataset and the source code of CrossFiT will be publicly available at https://github.com/FDU-VTS/DRTiD. △ Less

Submitted 1 December, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

Comments: BIBM 2022

arXiv:2211.12294 [pdf, other]

PointCA: Evaluating the Robustness of 3D Point Cloud Completion Models Against Adversarial Examples

Authors: Shengshan Hu, Junwei Zhang, Wei Liu, Junhui Hou, Minghui Li, Leo Yu Zhang, Hai **, Lichao Sun

Abstract: Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unkno… ▽ More Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause a performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data. △ Less

Submitted 1 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted by the 37th AAAI Conference on Artificial Intelligence (AAAI-23)

arXiv:2211.10829 [pdf]

Depositing boron on Cu(111): Borophene or boride?

Authors: Xiao-Ji Weng, Jie Bai, **gyu Hou, Yi Zhu, Li Wang, Penghui Li, Anmin Nie, Bo Xu, Xiang-Feng Zhou, Yongjun Tian

Abstract: Large-area single-crystal surface structures were successfully prepared on Cu(111) substrate with boron deposition, which is critical for prospective applications. However, the proposed borophene structures do not match the scanning tunneling microscopy (STM) results very well, while the proposed copper boride is at odds with the traditional knowledge that ordered copper-rich borides normally do n… ▽ More Large-area single-crystal surface structures were successfully prepared on Cu(111) substrate with boron deposition, which is critical for prospective applications. However, the proposed borophene structures do not match the scanning tunneling microscopy (STM) results very well, while the proposed copper boride is at odds with the traditional knowledge that ordered copper-rich borides normally do not exist due to small difference in electronegativity and large difference in atomic size. To clarify the controversy and elucidate the formation mechanism of the unexpected copper boride, we conducted systematic STM, X-ray photoelectron spectroscopy and angle-resolved photoemission spectroscopy investigations, confirming the synthesis of two-dimensional copper boride rather than borophene on Cu(111) after boron deposition under ultrahigh vacuum. First-principles calculations with defective surface models further indicate that boron atoms tend to react with Cu atoms near terrace edges or defects, which in turn shapes the intermediate structures of copper boride and leads to the formation of stable Cu-B monolayer via large-scale surface reconstruction eventually. △ Less

Submitted 19 November, 2022; originally announced November 2022.

Comments: 15 pages, 4 figures

arXiv:2211.10627 [pdf, other]

EGRC-Net: Embedding-induced Graph Refinement Clustering Network

Authors: Zhihao Peng, Hui Liu, Yuheng Jia, Junhui Hou

Abstract: Existing graph clustering networks heavily rely on a predefined yet fixed graph, which can lead to failures when the initial graph fails to accurately capture the data topology structure of the embedding space. In order to address this issue, we propose a novel clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net), which effectively utilizes the learned embeddi… ▽ More Existing graph clustering networks heavily rely on a predefined yet fixed graph, which can lead to failures when the initial graph fails to accurately capture the data topology structure of the embedding space. In order to address this issue, we propose a novel clustering network called Embedding-Induced Graph Refinement Clustering Network (EGRC-Net), which effectively utilizes the learned embedding to adaptively refine the initial graph and enhance the clustering performance. To begin, we leverage both semantic and topological information by employing a vanilla auto-encoder and a graph convolution network, respectively, to learn a latent feature representation. Subsequently, we utilize the local geometric structure within the feature embedding space to construct an adjacency matrix for the graph. This adjacency matrix is dynamically fused with the initial one using our proposed fusion architecture. To train the network in an unsupervised manner, we minimize the Jeffreys divergence between multiple derived distributions. Additionally, we introduce an improved approximate personalized propagation of neural predictions to replace the standard graph convolution network, enabling EGRC-Net to scale effectively. Through extensive experiments conducted on nine widely-used benchmark datasets, we demonstrate that our proposed methods consistently outperform several state-of-the-art approaches. Notably, EGRC-Net achieves an improvement of more than 11.99\% in Adjusted Rand Index (ARI) over the best baseline on the DBLP dataset. Furthermore, our scalable approach exhibits a 10.73% gain in ARI while reducing memory usage by 33.73% and decreasing running time by 19.71%. The code for EGRC-Net will be made publicly available at \url{https://github.com/ZhihaoPENG-CityU/EGRC-Net}. △ Less

Submitted 14 November, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

Comments: This paper has been accepted by IEEE Transactions on Image Processing

arXiv:2211.10294 [pdf]

Pressure-induced superconductivity in PdTeI with quasi-one-dimensional PdTe chains

Authors: Yi Zhao, Jun Hou, Yang Fu, Cuiying Pei, Jian** Sun, Qi Wang, Lingling Gao, Weizheng Cao, Changhua Li, Shihao Zhu, Mingxin Zhang, Yulin Chen, Hechang Lei, **guang Cheng, Yanpeng Qi

Abstract: The quasi-one-dimensional material PdTeI exhibits unusual electronic transport properties at ambient pressure. Here, we systematically investigate both the structural and electronic responses of PdTeI to external pressure, through a combination of electrical transport, synchrotron x-ray diffraction (XRD), and Raman spectroscopy measurements. The charge density wave (CDW) order in PdTeI is fragile… ▽ More The quasi-one-dimensional material PdTeI exhibits unusual electronic transport properties at ambient pressure. Here, we systematically investigate both the structural and electronic responses of PdTeI to external pressure, through a combination of electrical transport, synchrotron x-ray diffraction (XRD), and Raman spectroscopy measurements. The charge density wave (CDW) order in PdTeI is fragile and the transition temperature TCDW decreases rapidly with the application of external pressure. The resistivity hump is indiscernible when the pressure is increased to 1 GPa. Upon further compression, zero resistance is established above 20 GPa, suggesting the occurrence of superconductivity. Combined XRD and Raman data evidence that the emergence of superconductivity is accompanied by a pressure-induced amorphization of PdTeI. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: 18 pages, 6 figures

arXiv:2211.10253 [pdf, other]

Delving into Transformer for Incremental Semantic Segmentation

Authors: Zekai Xu, Mingyi Zhang, Jiayue Hou, Xing Gong, Chuan Wen, Chengjie Wang, Junge Zhang

Abstract: Incremental semantic segmentation(ISS) is an emerging task where old model is updated by incrementally adding new classes. At present, methods based on convolutional neural networks are dominant in ISS. However, studies have shown that such methods have difficulty in learning new tasks while maintaining good performance on old ones (catastrophic forgetting). In contrast, a Transformer based method… ▽ More Incremental semantic segmentation(ISS) is an emerging task where old model is updated by incrementally adding new classes. At present, methods based on convolutional neural networks are dominant in ISS. However, studies have shown that such methods have difficulty in learning new tasks while maintaining good performance on old ones (catastrophic forgetting). In contrast, a Transformer based method has a natural advantage in curbing catastrophic forgetting due to its ability to model both long-term and short-term tasks. In this work, we explore the reasons why Transformer based architecture are more suitable for ISS, and accordingly propose propose TISS, a Transformer based method for Incremental Semantic Segmentation. In addition, to better alleviate catastrophic forgetting while preserving transferability on ISS, we introduce two patch-wise contrastive losses to imitate similar features and enhance feature diversity respectively, which can further improve the performance of TISS. Under extensive experimental settings with Pascal-VOC 2012 and ADE20K datasets, our method significantly outperforms state-of-the-art incremental semantic segmentation methods. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2211.07913 [pdf, ps, other]

Extremal graphs for the suspension of edge-critical graphs

Authors: Jianfeng Hou, Heng Li, Qinghou Zeng

Abstract: The Turán number of a graph $H$, $\text{ex}(n,H)$, is the maximum number of edges in an $n$-vertex graph that does not contain $H$ as a subgraph. For a vertex $v$ and a multi-set $\mathcal{F}$ of graphs, the suspension $\mathcal{F}+v$ of $\mathcal{F}$ is the graph obtained by connecting the vertex $v$ to all vertices of $F$ for each $F\in \mathcal{F}$. For two integers $k\ge1$ and $r\ge2$, let… ▽ More The Turán number of a graph $H$, $\text{ex}(n,H)$, is the maximum number of edges in an $n$-vertex graph that does not contain $H$ as a subgraph. For a vertex $v$ and a multi-set $\mathcal{F}$ of graphs, the suspension $\mathcal{F}+v$ of $\mathcal{F}$ is the graph obtained by connecting the vertex $v$ to all vertices of $F$ for each $F\in \mathcal{F}$. For two integers $k\ge1$ and $r\ge2$, let $H_i$ be a graph containing a critical edge with chromatic number $r$ for any $i\in\{1,\ldots,k\}$, and let $H=\{H_1,\ldots,H_k\}+v$. In this paper, we determine $\text{ex}(n, H)$ and characterize all the extremal graphs for sufficiently large $n$. This generalizes a result of Chen, Gould, Pfender and Wei on intersecting cliques. We also obtain a stability theorem for $H$, extending a result of Roberts and Scott on graphs containing a critical edge. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, **gang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, **woo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.04894 [pdf, other]

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

Authors: Haoning Wu, Erli Zhang, Liang Liao, Chaofeng Chen, **gwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on conte… ▽ More The rapid increase in user-generated-content (UGC) videos calls for the development of effective video quality assessment (VQA) algorithms. However, the objective of the UGC-VQA problem is still ambiguous and can be viewed from two perspectives: the technical perspective, measuring the perception of distortions; and the aesthetic perspective, which relates to preference and recommendation on contents. To understand how these two perspectives affect overall subjective opinions in UGC-VQA, we conduct a large-scale subjective study to collect human quality opinions on overall quality of videos as well as perceptions from aesthetic and technical perspectives. The collected Disentangled Video Quality Database (DIVIDE-3k) confirms that human quality opinions on UGC videos are universally and inevitably affected by both aesthetic and technical perspectives. In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives. The DOVER proves state-of-the-art performance in UGC-VQA under very high efficiency. With perspective opinions in DIVIDE-3k, we further propose DOVER++, the first approach to provide reliable clear-cut quality evaluations from a single aesthetic or technical perspective. Code at https://github.com/VQAssessment/DOVER. △ Less

Submitted 7 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

arXiv:2211.04098 [pdf, other]

Abstraction-Based Verification of Approximate Pre-Opacity for Control Systems

Authors: Junyao Hou, Siyuan Liu, Xiang Yin, Majid Zamani

Abstract: In this paper, we consider the problem of verifying pre-opacity for discrete-time control systems. Pre-opacity is an important information-flow security property that secures the intention of a system to execute some secret behaviors in the future. Existing works on pre-opacity only consider non-metric discrete systems, where it is assumed that intruders can distinguish different output behaviors… ▽ More In this paper, we consider the problem of verifying pre-opacity for discrete-time control systems. Pre-opacity is an important information-flow security property that secures the intention of a system to execute some secret behaviors in the future. Existing works on pre-opacity only consider non-metric discrete systems, where it is assumed that intruders can distinguish different output behaviors precisely. However, for continuous-space control systems whose output sets are equipped with metrics (which is the case for most real-world applications), it is too restrictive to assume precise measurements from outside observers. In this paper, we first introduce a concept of approximate pre-opacity by capturing the security level of control systems with respect to the measurement precision of the intruder. Based on this new notion of pre-opacity, we propose a verification approach for continuous-space control systems by leveraging abstraction-based techniques. In particular, a new concept of approximate pre-opacity preserving simulation relation is introduced to characterize the distance between two systems in terms of preserving pre-opacity. This new system relation allows us to verify pre-opacity of complex continuous-space control systems using their finite abstractions. We also present a method to construct pre-opacity preserving finite abstractions for a class of discrete-time control systems under certain stability assumptions. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Discrete Event Systems, Opacity, Formal Abstractions

arXiv:2211.02838 [pdf, ps, other]

Two stability theorems for $\mathcal{K}_{\ell + 1}^{r}$-saturated hypergraphs

Authors: Jianfeng Hou, Heng Li, Caihong Yang, Qinghou Zeng, Yixiao Zhang

Abstract: An $\mathcal{F}$-saturated $r$-graph is a maximal $r$-graph not containing any member of $\mathcal{F}$ as a subgraph. Let $\mathcal{K}_{\ell + 1}^{r}$ be the collection of all $r$-graphs $F$ with at most $\binom{\ell+1}{2}$ edges such that for some $\left(\ell+1\right)$-set $S$ every pair $\{u, v\} \subset S$ is covered by an edge in $F$. Our first result shows that for each $\ell \geq r \geq 2$ e… ▽ More An $\mathcal{F}$-saturated $r$-graph is a maximal $r$-graph not containing any member of $\mathcal{F}$ as a subgraph. Let $\mathcal{K}_{\ell + 1}^{r}$ be the collection of all $r$-graphs $F$ with at most $\binom{\ell+1}{2}$ edges such that for some $\left(\ell+1\right)$-set $S$ every pair $\{u, v\} \subset S$ is covered by an edge in $F$. Our first result shows that for each $\ell \geq r \geq 2$ every $\mathcal{K}_{\ell+1}^{r}$-saturated $r$-graph on $n$ vertices with $t_{r}(n, \ell) - o(n^{r-1+1/\ell})$ edges contains a complete $\ell$-partite subgraph on $(1-o(1))n$ vertices, which extends a stability theorem for $K_{\ell+1}$-saturated graphs given by Popielarz, Sahasrabudhe and Snyder. We also show that the bound is best possible. Our second result is motivated by a celebrated theorem of Andrásfai, Erdős and Sós which states that for $\ell \geq 2$ every $K_{\ell+1}$-free graph $G$ on $n$ vertices with minimum degree $δ(G) > \frac{3\ell-4}{3\ell-1}n$ is $\ell$-partite. We give a hypergraph version of it. The \emph{minimum positive co-degree} of an $r$-graph $\mathcal{H}$, denoted by $δ_{r-1}^{+}(\mathcal{H})$, is the maximum $k$ such that if $S$ is an $(r-1)$-set contained in a edge of $\mathcal{H}$, then $S$ is contained in at least $k$ distinct edges of $\mathcal{H}$. Let $\ell\ge 3$ be an integer and $\mathcal{H}$ be a $\mathcal{K}_{\ell+1}^3$-saturated $3$-graph on $n$ vertices. We prove that if either $\ell \ge 4$ and $δ_{2}^{+}(\mathcal{H}) > \frac{3\ell-7}{3\ell-1}n$; or $\ell = 3$ and $δ_{2}^{+}(\mathcal{H}) > 2n/7$, then $\mathcal{H}$ is $\ell$-partite; and the bound is best possible. This is the first stability result on minimum positive co-degree for hypergraphs. △ Less

Submitted 5 November, 2022; originally announced November 2022.

arXiv:2211.02419 [pdf, other]

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Authors: Yucong Lin, **hua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, **gfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Abstract: Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We develope… ▽ More Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2211.01601 [pdf]

A Fast Solution Method for Large-scale Unit Commitment Based on Lagrangian Relaxation and Dynamic Programming

Authors: Jiangwei Hou, Qiaozhu Zhai, Yuzhou Zhou, Xiaohong Guan

Abstract: The unit commitment problem (UC) is crucial for the operation and market mechanism of power systems. With the development of modern electricity, the scale of power systems is expanding, and solving the UC problem is also becoming more and more difficult. To this end, this paper proposes a new fast solution method based on Lagrangian relaxation and dynamic program-ming. Firstly, the UC solution is… ▽ More The unit commitment problem (UC) is crucial for the operation and market mechanism of power systems. With the development of modern electricity, the scale of power systems is expanding, and solving the UC problem is also becoming more and more difficult. To this end, this paper proposes a new fast solution method based on Lagrangian relaxation and dynamic program-ming. Firstly, the UC solution is estimated to be an initial trial UC solution by a fast method based on Lagrangian relaxation. This initial trial UC solution fully considers the system-wide con-straints. Secondly, a dynamic programming module is introduced to adjust the trial UC solution to make it satisfy the unit-wise constraints. Thirdly, a method for constructing a feasible UC solution is proposed based on the adjusted trial UC solution. Specifically, a feasibility-testing model and an updating strategy for the trial UC solution are established in this part. Numerical tests are implemented on IEEE 24-bus, IEEE 118-bus, Polish 2383-bus, and French 6468-bus systems, which verify the effec-tiveness and efficiency of the proposed method. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: 10 pages, journal paper, transactions

arXiv:2211.00723 [pdf, other]

${\rm S{\scriptsize IM}BIG}$: A Forward Modeling Approach To Analyzing Galaxy Clustering

Authors: ChangHoon Hahn, Michael Eickenberg, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Massara, Chirag Modi, Azadeh Moradinezhad Dizgah, Bruno Régaldo-Saint Blancard, Muntazir M. Abidi

Abstract: We present the first-ever cosmological constraints from a simulation-based inference (SBI) analysis of galaxy clustering from the new ${\rm S{\scriptsize IM}BIG}$ forward modeling framework. ${\rm S{\scriptsize IM}BIG}$ leverages the predictive power of high-fidelity simulations and provides an inference framework that can extract cosmological information on small non-linear scales, inaccessible w… ▽ More We present the first-ever cosmological constraints from a simulation-based inference (SBI) analysis of galaxy clustering from the new ${\rm S{\scriptsize IM}BIG}$ forward modeling framework. ${\rm S{\scriptsize IM}BIG}$ leverages the predictive power of high-fidelity simulations and provides an inference framework that can extract cosmological information on small non-linear scales, inaccessible with standard analyses. In this work, we apply ${\rm S{\scriptsize IM}BIG}$ to the BOSS CMASS galaxy sample and analyze the power spectrum, $P_\ell(k)$, to $k_{\rm max}=0.5\,h/{\rm Mpc}$. We construct 20,000 simulated galaxy samples using our forward model, which is based on high-resolution ${\rm Q{\scriptsize UIJOTE}}$ $N$-body simulations and includes detailed survey realism for a more complete treatment of observational systematics. We then conduct SBI by training normalizing flows using the simulated samples and infer the posterior distribution of $Λ$CDM cosmological parameters: $Ω_m, Ω_b, h, n_s, σ_8$. We derive significant constraints on $Ω_m$ and $σ_8$, which are consistent with previous works. Our constraints on $σ_8$ are $27\%$ more precise than standard analyses. This improvement is equivalent to the statistical gain expected from analyzing a galaxy sample that is $\sim60\%$ larger than CMASS with standard methods. It results from additional cosmological information on non-linear scales beyond the limit of current analytic models, $k > 0.25\,h/{\rm Mpc}$. While we focus on $P_\ell$ in this work for validation and comparison to the literature, ${\rm S{\scriptsize IM}BIG}$ provides a framework for analyzing galaxy clustering using any summary statistic. We expect further improvements on cosmological constraints from subsequent ${\rm S{\scriptsize IM}BIG}$ analyses of summary statistics beyond $P_\ell$. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 9 pages, 5 figures

arXiv:2211.00660 [pdf, other]

doi 10.1088/1475-7516/2023/04/010

${\rm S{\scriptsize IM}BIG}$: Mock Challenge for a Forward Modeling Approach to Galaxy Clustering

Authors: ChangHoon Hahn, Michael Eickenberg, Shirley Ho, Jiamin Hou, Pablo Lemos, Elena Massara, Chirag Modi, Azadeh Moradinezhad Dizgah, Bruno Régaldo-Saint Blancard, Muntazir M. Abidi

Abstract: Simulation-Based Inference of Galaxies (${\rm S{\scriptsize IM}BIG}$) is a forward modeling framework for analyzing galaxy clustering using simulation-based inference. In this work, we present the ${\rm S{\scriptsize IM}BIG}$ forward model, which is designed to match the observed SDSS-III BOSS CMASS galaxy sample. The forward model is based on high-resolution ${\rm Q{\scriptsize UIJOTE}}$ $N$-body… ▽ More Simulation-Based Inference of Galaxies (${\rm S{\scriptsize IM}BIG}$) is a forward modeling framework for analyzing galaxy clustering using simulation-based inference. In this work, we present the ${\rm S{\scriptsize IM}BIG}$ forward model, which is designed to match the observed SDSS-III BOSS CMASS galaxy sample. The forward model is based on high-resolution ${\rm Q{\scriptsize UIJOTE}}$ $N$-body simulations and a flexible halo occupation model. It includes full survey realism and models observational systematics such as angular masking and fiber collisions. We present the "mock challenge" for validating the accuracy of posteriors inferred from ${\rm S{\scriptsize IM}BIG}$ using a suite of 1,500 test simulations constructed using forward models with a different $N$-body simulation, halo finder, and halo occupation prescription. As a demonstration of ${\rm S{\scriptsize IM}BIG}$, we analyze the power spectrum multipoles out to $k_{\rm max} = 0.5\,h/{\rm Mpc}$ and infer the posterior of $Λ$CDM cosmological and halo occupation parameters. Based on the mock challenge, we find that our constraints on $Ω_m$ and $σ_8$ are unbiased, but conservative. Hence, the mock challenge demonstrates that ${\rm S{\scriptsize IM}BIG}$ provides a robust framework for inferring cosmological parameters from galaxy clustering on non-linear scales and a complete framework for handling observational systematics. In subsequent work, we will use ${\rm S{\scriptsize IM}BIG}$ to analyze summary statistics beyond the power spectrum including the bispectrum, marked power spectrum, skew spectrum, wavelet statistics, and field-level statistics. △ Less

Submitted 1 November, 2022; originally announced November 2022.

Comments: 28 pages, 6 figures

arXiv:2210.17456 [pdf, other]

Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings

Authors: I-Chun Chern, Kuo-Hsuan Hung, Yi-Ting Chen, Tassadaq Hussain, Mandar Gogate, Amir Hussain, Yu Tsao, Jen-Cheng Hou

Abstract: AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-moda… ▽ More AV-HuBERT, a multi-modal self-supervised learning model, has been shown to be effective for categorical problems such as automatic speech recognition and lip-reading. This suggests that useful audio-visual speech representations can be obtained via utilizing multi-modal self-supervised embeddings. Nevertheless, it is unclear if such representations can be generalized to solve real-world multi-modal AV regression tasks, such as audio-visual speech enhancement (AVSE) and audio-visual speech separation (AVSS). In this study, we leveraged the pre-trained AV-HuBERT model followed by an SE module for AVSE and AVSS. Comparative experimental results demonstrate that our proposed model performs better than the state-of-the-art AVSE and traditional audio-only SE models. In summary, our results confirm the effectiveness of our proposed model for the AVSS task with proper fine-tuning strategies, demonstrating that multi-modal self-supervised embeddings obtained from AV-HuBERT can be generalized to audio-visual regression tasks. △ Less

Submitted 31 May, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: ICASSP AMHAT 2023

arXiv:2210.16743 [pdf, other]

WeKws: A production first small-footprint end-to-end Keyword Spotting Toolkit

Authors: Jie Wang, Menglong Xu, **gyong Hou, Binbin Zhang, Xiao-Lei Zhang, Lei Xie, Fu** Pan

Abstract: Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-t… ▽ More Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still a gap between the research and deployment of E2E KWS methods. In this paper, we introduce WeKws, a production-quality, easy-to-build, and convenient-to-be-applied E2E KWS toolkit. WeKws contains the implementations of several state-of-the-art backbone networks, making it achieve highly competitive results on three publicly available datasets. To make WeKws a pure E2E toolkit, we utilize a refined max-pooling loss to make the model learn the ending position of the keyword by itself, which significantly simplifies the training pipeline and makes WeKws very efficient to be applied in real-world scenarios. The toolkit is publicly available at https://github.com/wenet-e2e/wekws. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.12743 [pdf, other]

doi 10.1088/1475-7516/2023/03/045

Cosmological Information in Skew Spectra of Biased Tracers in Redshift Space

Authors: Jiamin Hou, Azadeh Moradinezhad Dizgah, ChangHoon Hahn, Elena Massara

Abstract: Extracting the non-Gaussian information encoded in the higher-order clustering statistics of the large-scale structure is key to fully realizing the potential of upcoming galaxy surveys. We investigate the information content of the redshift-space {\it weighted skew spectra} of biased tracers as efficient estimators for 3-point clustering statistics. The skew spectra are constructed by correlating… ▽ More Extracting the non-Gaussian information encoded in the higher-order clustering statistics of the large-scale structure is key to fully realizing the potential of upcoming galaxy surveys. We investigate the information content of the redshift-space {\it weighted skew spectra} of biased tracers as efficient estimators for 3-point clustering statistics. The skew spectra are constructed by correlating the observed galaxy field with an appropriately-weighted square of it. We perform numerical Fisher forecasts using two synthetic datasets; the halo catalogs from the Quijote N-body simulations and the galaxy catalogs from the Molino suite. The latter serves to understand the effect of marginalization over a more complex matter-tracer biasing relation. Compared to the power spectrum multipoles, we show that the skew spectra substantially improve the constraints on six parameters of the $νΛ$CDM model, $\{Ω_m, Ω_b, h, n_s, σ_8, M_ν\}$. Imposing a small-scale cutoff of $k_{\rm max}=0.25 \, {\rm Mpc}^{-1}h$, the improvements from skew spectra alone range from 23% to 62% for the Quijote halos and from 32% to 71% for the Molino galaxies. Compared to the previous analysis of the bispectrum monopole on the same data and using the same range of scales, the skew spectra of Quijote halos provide competitive constraints. Conversely, the skew spectra outperform the bispectrum monopole for all cosmological parameters for the Molino catalogs. This may result from additional anisotropic information, particularly enhanced in the Molino sample, that is captured by the skew spectra but not by the bispectrum monopole. Our stability analysis of the numerical derivatives shows comparable convergence rates for the power spectrum and the skew spectra, indicating potential underestimation of parameter uncertainties by at most 30%. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: 43 pages, 25 figures

arXiv:2210.10410 [pdf, other]

doi 10.1103/PhysRevB.108.035407

Effects of spatial dimensionality and band tilting on the longitudinal optical conductivities in Dirac bands

Authors: Jian-Tong Hou, Chang-Xu Yan, Chao-Yang Tan, Zhi-Qiang Li, Peng Wang, Hong Guo, Hao-Ran Chang

Abstract: We report a unified theory based on linear response, for analyzing the longitudinal optical conductivity (LOC) of materials with tilted Dirac cones. Depending on the tilt parameter $t$, the Dirac electrons have four phases: untilted, type-I, type-II, and type-III; the Dirac dispersion can be isotropic or anisotropic; the spatial dimension of the material can be one-, two-, or three-dimensions (1D,… ▽ More We report a unified theory based on linear response, for analyzing the longitudinal optical conductivity (LOC) of materials with tilted Dirac cones. Depending on the tilt parameter $t$, the Dirac electrons have four phases: untilted, type-I, type-II, and type-III; the Dirac dispersion can be isotropic or anisotropic; the spatial dimension of the material can be one-, two-, or three-dimensions (1D, 2D and 3D). The interband LOCs and intraband LOCs in $d$ dimension (with $d\ge2$) are found to scale as $σ_{0}ω^{d-2}$ and $σ_{0}μ^{d-1}δ(ω)$, respectively, where $ω$ is the frequency and $μ$ the chemical potential. The interband LOC vanishes in 1D due to lack of extra spatial dimension. In contrast, the interband LOCs in 2D and 3D are nonvanishing and share many similar properties. A universal and robust fixed point of interband LOCs appears at $ω=2μ$ no matter $d=2$ or $d=3$, which can be intuitively understood by the geometric structures of Fermi surface and energy resonance contour. The intraband LOCs and the carrier density for 2D and 3D tilted Dirac bands are both closely related to the geometric structure of Fermi surface and the cutoff of integration. The angular dependence of LOCs is found to characterize both spatial dimensionality and band tilting and the constant asymptotic background values of LOC reflect features of Dirac bands. The LOCs in the anisotropic tilted Dirac cone can be connected to its isotropic counterpart by a ratio that consists of Fermi velocities for both 2D and 3D. Most of the findings are universal for tilted Dirac materials and hence valid for a great many Dirac materials in the spatial dimensions of physical interest. △ Less

Submitted 24 November, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 22 pages, 7 figures

Journal ref: Phys. Rev. B 108, 035407 (2023)

arXiv:2210.10402 [pdf]

doi 10.1016/j.asr.2022.10.045

Solar Ring Mission: Building a Panorama of the Sun and Inner-heliosphere

Authors: Yuming Wang, Xianyong Bai, Changyong Chen, Linjie Chen, Xin Cheng, Lei Deng, Linhua Deng, Yuanyong Deng, Li Feng, Tingyu Gou, **gnan Guo, Yang Guo, Xinjun Hao, Jiansen He, Junfeng Hou, Huang Jiangjiang, Zhenghua Huang, Haisheng Ji, Chaowei Jiang, Jie Jiang, Chunlan **, Xiaolei Li, Yiren Li, Jiajia Liu, Kai Liu , et al. (29 additional authors not shown)

Abstract: Solar Ring (SOR) is a proposed space science mission to monitor and study the Sun and inner heliosphere from a full 360° perspective in the ecliptic plane. It will deploy three 120°-separated spacecraft on the 1-AU orbit. The first spacecraft, S1, locates 30° upstream of the Earth, the second, S2, 90° downstream, and the third, S3, completes the configuration. This design with necessary science in… ▽ More Solar Ring (SOR) is a proposed space science mission to monitor and study the Sun and inner heliosphere from a full 360° perspective in the ecliptic plane. It will deploy three 120°-separated spacecraft on the 1-AU orbit. The first spacecraft, S1, locates 30° upstream of the Earth, the second, S2, 90° downstream, and the third, S3, completes the configuration. This design with necessary science instruments, e.g., the Doppler-velocity and vector magnetic field imager, wide-angle coronagraph, and in-situ instruments, will allow us to establish many unprecedented capabilities: (1) provide simultaneous Doppler-velocity observations of the whole solar surface to understand the deep interior, (2) provide vector magnetograms of the whole photosphere - the inner boundary of the solar atmosphere and heliosphere, (3) provide the information of the whole lifetime evolution of solar featured structures, and (4) provide the whole view of solar transients and space weather in the inner heliosphere. With these capabilities, Solar Ring mission aims to address outstanding questions about the origin of solar cycle, the origin of solar eruptions and the origin of extreme space weather events. The successful accomplishment of the mission will construct a panorama of the Sun and inner-heliosphere, and therefore advance our understanding of the star and the space environment that holds our life. △ Less

Submitted 23 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 41 pages, 6 figures, 1 table, to be published in Advances in Space Research

arXiv:2210.10206 [pdf, other]

SARABANDE: 3/4 Point Correlation Functions with Fast Fourier Transforms

Authors: James Sunseri, Zachary Slepian, Stephen Portillo, Jiamin Hou, Sule Kahraman, Douglas P. Finkbeiner

Abstract: We present a new $\texttt{python}$ package SARABANDE for measuring 3 & 4 Point Correlation Functions (3/4 PCFs) in $\mathcal{O}(N_{\rm g} \log N_{\rm g})$ time using Fast Fourier Transforms (FFTs), with $N_{\rm g}$ the number of grid points used for the FFT. SARABANDE can measure both projected and full 3 and 4 PCFs on gridded 2D and 3D datasets. The general technique is to generate suitable angul… ▽ More We present a new $\texttt{python}$ package SARABANDE for measuring 3 & 4 Point Correlation Functions (3/4 PCFs) in $\mathcal{O}(N_{\rm g} \log N_{\rm g})$ time using Fast Fourier Transforms (FFTs), with $N_{\rm g}$ the number of grid points used for the FFT. SARABANDE can measure both projected and full 3 and 4 PCFs on gridded 2D and 3D datasets. The general technique is to generate suitable angular basis functions on an underlying grid, radially bin these to create kernels, and convolve these kernels with the original gridded data to obtain expansion coefficients about every point simultaneously. These coefficients are then combined to give us the 3/4 PCF as expanded in our basis. We apply SARABANDE to simulations of the Interstellar Medium (ISM) to show the results and scaling of calculating both the full and projected 3/4 PCFs. △ Less

Submitted 25 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 16 Pages, 8 Figures, 8 Algorithms, 1 code package

arXiv:2210.07749

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

Authors: Yan Jia, Mi Hong, **gyu Hou, Kailong Ren, Sifan Ma, ** Wang, Fangzhen Peng, Yinglin Ji, Lin Yang, Junjie Wang

Abstract: This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusi… ▽ More This paper describes LeVoice automatic speech recognition systems to track2 of intelligent cockpit speech recognition challenge 2022. Track2 is a speech recognition task without limits on the scope of model size. Our main points include deep learning based speech enhancement, text-to-speech based speech generation, training data augmentation via various techniques and speech recognition model fusion. We compared and fused the hybrid architecture and two kinds of end-to-end architecture. For end-to-end modeling, we used models based on connectionist temporal classification/attention-based encoder-decoder architecture and recurrent neural network transducer/attention-based encoder-decoder architecture. The performance of these models is evaluated with an additional language model to improve word error rates. As a result, our system achieved 10.2\% character error rate on the challenge test set data and ranked third place among the submitted systems in the challenge. △ Less

Submitted 16 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: There are experimental errors

arXiv:2210.07304 [pdf, other]

doi 10.1093/mnras/stad849

Beyond $Λ$CDM constraints from the full shape clustering measurements from BOSS and eBOSS

Authors: Agne Semenaite, Ariel G. Sánchez, Andrea Pezzotta, Jiamin Hou, Alexander Eggemeier, Martin Crocce, Cheng Zhao, Joel R. Brownstein, Graziano Rossi, Donald P. Schneider

Abstract: We analyse the full shape of anisotropic clustering measurements from the extended Baryon Oscillation Spectroscopic survey (eBOSS) quasar sample together with the combined galaxy sample from the Baryon Oscillation Spectroscopic Survey (BOSS). We obtain constraints on the cosmological parameters independent of the Hubble parameter $h$ for the extensions of the $Λ$CDM models, focusing on cosmologies… ▽ More We analyse the full shape of anisotropic clustering measurements from the extended Baryon Oscillation Spectroscopic survey (eBOSS) quasar sample together with the combined galaxy sample from the Baryon Oscillation Spectroscopic Survey (BOSS). We obtain constraints on the cosmological parameters independent of the Hubble parameter $h$ for the extensions of the $Λ$CDM models, focusing on cosmologies with free dark energy equation of state parameter $w$. We combine the clustering constraints with those from the latest CMB data from Planck to obtain joint constraints for these cosmologies for $w$ and the additional extension parameters - its time evolution $w_{\rm{a}}$, the physical curvature density $ω_{K}$ and the neutrino mass sum $\sum m_ν$. Our joint constraints are consistent with flat $Λ$CDM cosmological model within 68\% confidence limits. We demonstrate that the Planck data are able to place tight constraints on the clustering amplitude today, $σ_{12}$, in cosmologies with varying $w$ and present the first constraints for the clustering amplitude for such cosmologies, which is found to be slightly higher than the $Λ$CDM value. Additionally, we show that when we vary $w$ and allow for non-flat cosmologies and the physical curvature density is used, Planck prefers a curved universe at $4σ$ significance, which is $\sim2σ$ higher than when using the relative curvature density $Ω_{\rm{K}}$. Finally, when $w$ is varied freely, clustering provides only a modest improvement (of 0.021 eV) on the upper limit of $\sum m_ν$. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 12 pages, 6 figures, submitted to MNRAS

arXiv:2210.05357 [pdf, other]

Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

Authors: Haoning Wu, Chaofeng Chen, Liang Liao, **gwen Hou, Wenxiu Sun, Qiong Yan, **wei Gu, Weisi Lin

Abstract: The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, kee** the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing and crop**, will change the quality of original videos due to the loss of details and contents, and are ther… ▽ More The increased resolution of real-world videos presents a dilemma between efficiency and accuracy for deep Video Quality Assessment (VQA). On the one hand, kee** the original resolution will lead to unacceptable computational costs. On the other hand, existing practices, such as resizing and crop**, will change the quality of original videos due to the loss of details and contents, and are therefore harmful to quality assessment. With the obtained insight from the study of spatial-temporal redundancy in the human visual system and visual coding theory, we observe that quality information around a neighbourhood is typically similar, motivating us to investigate an effective quality-sensitive neighbourhood representatives scheme for VQA. In this work, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) to get a novel type of sample, named fragments. Full-resolution videos are first divided into mini-cubes with preset spatial-temporal grids, then the temporal-aligned quality representatives are sampled to compose the fragments that serve as inputs for VQA. In addition, we design the Fragment Attention Network (FANet), a network architecture tailored specifically for fragments. With fragments and FANet, the proposed efficient end-to-end FAST-VQA and FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current state-of-the-art. Codes, models and demos are available at https://github.com/timothyhtimothy/FAST-VQA-and-FasterVQA. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.02547 [pdf, other]

doi 10.3847/1538-4365/acc181

Multi-messenger characterization of Mrk 501 during historically low X-ray and $γ$-ray activity

Authors: MAGIC collaboration, H. Abe, S. Abe, V. A. Acciari, I. Agudo, T. Aniello, S. Ansoldi, L. A. Antonelli, A. Arbet Engels, C. Arcaro, M. Artero, K. Asano, D. Baack, A. Babić, A. Baquero, U. Barres de Almeida, J. A. Barrio, I. Batković, J. Baxter, J. Becerra González, W. Bednarek, E. Bernardini, M. Bernardos, A. Berti, J. Besenrieder , et al. (300 additional authors not shown)

Abstract: We study the broadband emission of Mrk 501 using multi-wavelength observations from 2017 to 2020 performed with a multitude of instruments, involving, among others, MAGIC, Fermi-LAT, NuSTAR, Swift, GASP-WEBT, and OVRO. Mrk 501 showed an extremely low broadband activity, which may help to unravel its baseline emission. Nonetheless, significant flux variations are detected at all wavebands, with the… ▽ More We study the broadband emission of Mrk 501 using multi-wavelength observations from 2017 to 2020 performed with a multitude of instruments, involving, among others, MAGIC, Fermi-LAT, NuSTAR, Swift, GASP-WEBT, and OVRO. Mrk 501 showed an extremely low broadband activity, which may help to unravel its baseline emission. Nonetheless, significant flux variations are detected at all wavebands, with the highest occurring at X-rays and very-high-energy (VHE) $γ$-rays. A significant correlation ($>$3$σ$) between X-rays and VHE $γ$-rays is measured, supporting leptonic scenarios to explain the variable parts of the emission, also during low activity. This is further supported when we extend our data from 2008 to 2020, and identify, for the first time, significant correlations between Swift-XRT and Fermi-LAT. We additionally find correlations between high-energy $γ$-rays and radio, with the radio lagging by more than 100 days, placing the $γ$-ray emission zone upstream of the radio-bright regions in the jet. Furthermore, Mrk 501 showed a historically low activity in X-rays and VHE $γ$-rays from mid-2017 to mid-2019 with a stable VHE flux ($>$0.2 TeV) of 5% the emission of the Crab Nebula. The broadband spectral energy distribution (SED) of this 2-year-long low-state, the potential baseline emission of Mrk 501, can be characterized with one-zone leptonic models, and with (lepto)-hadronic models fulfilling neutrino flux constraints from IceCube. We explore the time evolution of the SED towards the low-state, revealing that the stable baseline emission may be ascribed to a standing shock, and the variable emission to an additional expanding or traveling shock. △ Less

Submitted 5 March, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Comments: 55 pages, 30 figures, 14 tables, accepted by APJS. Corresponding authors are L. Heckmann, D. Paneque, S. Gasparyan, M. Cerruti, and N. Sahakyan

Journal ref: ApJS 266 37 (2023)

arXiv:2210.02030 [pdf, other]

Point Cloud Recognition with Position-to-Structure Attention Transformers

Authors: Zheng Ding, James Hou, Zhuowen Tu

Abstract: In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition. PS-Former deals with the challenge in 3D point cloud representation where points are not positioned in a fixed grid structure and have limited feature description (only 3D coordinates ($x, y, z$) for scattered points). Existing Transformer-based architec… ▽ More In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition. PS-Former deals with the challenge in 3D point cloud representation where points are not positioned in a fixed grid structure and have limited feature description (only 3D coordinates ($x, y, z$) for scattered points). Existing Transformer-based architectures in this domain often require a pre-specified feature engineering step to extract point features. Here, we introduce two new aspects in PS-Former: 1) a learnable condensation layer that performs point downsampling and feature extraction; and 2) a Position-to-Structure Attention mechanism that recursively enriches the structural information with the position attention branch. Compared with the competing methods, while being generic with less heuristics feature designs, PS-Former demonstrates competitive experimental results on three 3D point cloud tasks including classification, part segmentation, and scene segmentation. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.00515 [pdf, other]

Deep-OCTA: Ensemble Deep Learning Approaches for Diabetic Retinopathy Analysis on OCTA Images

Authors: Junlin Hou, Fan Xiao, Jilan Xu, Yuejie Zhang, Haidong Zou, Rui Feng

Abstract: The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segment… ▽ More The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segmentation of DR lesions task, we utilize UNet and UNet++ to segment three lesions with strong data augmentation and model ensemble. In the image quality assessment task, we create an ensemble of InceptionV3, SE-ResNeXt, and Vision Transformer models. Pre-training on the large dataset as well as the hybrid MixUp and CutMix strategy are both adopted to boost the generalization ability of our model. In the DR grading task, we build a Vision Transformer (ViT) and fnd that the ViT model pre-trained on color fundus images serves as a useful substrate for OCTA images. Our proposed methods ranked 4th, 3rd, and 5th on the three leaderboards of DRAC, respectively. The source code will be made available at https://github.com/FDU-VTS/DRAC. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.15241 [pdf, other]

doi 10.1103/PhysRevB.106.L220501

Helium-bearing superconductor at high pressure

Authors: **gyu Hou, Xiao Dong, Artem R. Oganov, Xiao-Ji Weng, Chun-Mei Hao, Guochun Yang, Hui-Tian Wang, Xiang-Feng Zhou, Yongjun Tian

Abstract: Helium (He) is the most inert noble gas at ambient conditions. It adopts a hexagonal close packed structure (P63/mmc) and remains in the insulating phase up to 32 TPa. In contrast, lithium (Li) is one of the most reactive metals at zero pressure, while its cubic high-pressure phase (Fd-3m) is a weak metallic electride above 475 GPa. Strikingly, a stable compound of Li5He2 (R-3m) was formed by mixi… ▽ More Helium (He) is the most inert noble gas at ambient conditions. It adopts a hexagonal close packed structure (P63/mmc) and remains in the insulating phase up to 32 TPa. In contrast, lithium (Li) is one of the most reactive metals at zero pressure, while its cubic high-pressure phase (Fd-3m) is a weak metallic electride above 475 GPa. Strikingly, a stable compound of Li5He2 (R-3m) was formed by mixing Fd-3m Li with P63/mmc He above 700 GPa. The presence of helium promotes the lattice transformation from Fd-3m Li to Pm-3m Li, and tuns the three-dimensional distributed interstitial electrons into the mixture of zero- and two-dimensional anionic electrons. This significantly increases the degree of metallization at the Fermi level, consequently, the coupling of conductive anionic electrons with the Li-dominated vibrations is the key factor to the formation of superconducting electride Li5He2 with a transition temperature up to 26 K, dynamically stable to pressures down to 210 GPa. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 5 pages, 3 figures

arXiv:2209.13252 [pdf, other]

RIGA: Rotation-Invariant and Globally-Aware Descriptors for Point Cloud Registration

Authors: Hao Yu, Ji Hou, Zheng Qin, Mahdi Saleh, Ivan Shugurov, Kai Wang, Benjamin Busam, Slobodan Ilic

Abstract: Successful point cloud registration relies on accurate correspondences established upon powerful descriptors. However, existing neural descriptors either leverage a rotation-variant backbone whose performance declines under large rotations, or encode local geometry that is less distinctive. To address this issue, we introduce RIGA to learn descriptors that are Rotation-Invariant by design and Glob… ▽ More Successful point cloud registration relies on accurate correspondences established upon powerful descriptors. However, existing neural descriptors either leverage a rotation-variant backbone whose performance declines under large rotations, or encode local geometry that is less distinctive. To address this issue, we introduce RIGA to learn descriptors that are Rotation-Invariant by design and Globally-Aware. From the Point Pair Features (PPFs) of sparse local regions, rotation-invariant local geometry is encoded into geometric descriptors. Global awareness of 3D structures and geometric context is subsequently incorporated, both in a rotation-invariant fashion. More specifically, 3D structures of the whole frame are first represented by our global PPF signatures, from which structural descriptors are learned to help geometric descriptors sense the 3D world beyond local regions. Geometric context from the whole scene is then globally aggregated into descriptors. Finally, the description of sparse regions is interpolated to dense point descriptors, from which correspondences are extracted for registration. To validate our approach, we conduct extensive experiments on both object- and scene-level data. With large rotations, RIGA surpasses the state-of-the-art methods by a margin of 8\degree in terms of the Relative Rotation Error on ModelNet40 and improves the Feature Matching Recall by at least 5 percentage points on 3DLoMatch. △ Less

Submitted 27 September, 2022; originally announced September 2022.

arXiv:2209.08515 [pdf, other]

The Chocolate Chip Cookie Model: Dust Geometry of Milky-Way like Disk Galaxies

Authors: Jiafeng Lu, Shiyin Shen, Fang-Ting Yuan, Zhengyi Shao, **liang Hou, Xianzhong Zheng

Abstract: We present a new two-component dust geometry model, the \textit{Chocolate Chip Cookie} model, where the clumpy nebular regions are embedded in a diffuse stellar/ISM disk, like chocolate chips in a cookie. By approximating the binomial distribution of the clumpy nebular regions with a continuous Gaussian distribution and omitting the dust scattering effect, our model solves the dust attenuation pro… ▽ More We present a new two-component dust geometry model, the \textit{Chocolate Chip Cookie} model, where the clumpy nebular regions are embedded in a diffuse stellar/ISM disk, like chocolate chips in a cookie. By approximating the binomial distribution of the clumpy nebular regions with a continuous Gaussian distribution and omitting the dust scattering effect, our model solves the dust attenuation process for both the emission lines and stellar continua via analytical approaches. Our Chocolate Chip Cookie model successfully fits the inclination dependence of both the effective dust reddening of the stellar components derived from stellar population synthesis and that of the emission lines characterized by the Balmer decrement for a large sample of Milky-Way like disk galaxies selected from the main galaxy sample of the Sloan Digital Sky Survey (SDSS). Our model shows that the clumpy nebular disk is about 0.55 times thinner and 1.6 times larger than the stellar disk for MW-like galaxies, whereas each clumpy region has a typical optical depth $τ_{\rm{cl,V}} \sim 0.5$ in $V$ band. After considering the aperture effect, our model prediction on the inclination dependence of dust attenuation is also consistent with observations. Not only that, in our model, the dust attenuation curve of the stellar population naturally depends on inclination and its median case is consistent with the classical Calzetti law. Since the modelling constraints are from the optical wavelengths, our model is unaffected by the optically thick dust component, which however could bias the model's prediction of the infrared emissions. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: 27 pages, 11 figures, 1 table

arXiv:2209.05013 [pdf, other]

Learning A Locally Unified 3D Point Cloud for View Synthesis

Authors: Meng You, Mantang Guo, Xianqiang Lyu, Hui Liu, Junhui Hou

Abstract: In this paper, we explore the problem of 3D point cloud representation-based view synthesis from a set of sparse source views. To tackle this challenging problem, we propose a new deep learning-based view synthesis paradigm that learns a locally unified 3D point cloud from source views. Specifically, we first construct sub-point clouds by projecting source views to 3D space based on their depth ma… ▽ More In this paper, we explore the problem of 3D point cloud representation-based view synthesis from a set of sparse source views. To tackle this challenging problem, we propose a new deep learning-based view synthesis paradigm that learns a locally unified 3D point cloud from source views. Specifically, we first construct sub-point clouds by projecting source views to 3D space based on their depth maps. Then, we learn the locally unified 3D point cloud by adaptively fusing points at a local neighborhood defined on the union of the sub-point clouds. Besides, we also propose a 3D geometry-guided image restoration module to fill the holes and recover high-frequency details of the rendered novel views. Experimental results on three benchmark datasets demonstrate that our method can improve the average PSNR by more than 4 dB while preserving more accurate visual details, compared with state-of-the-art view synthesis methods. △ Less

Submitted 30 September, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

Comments: Accepted to TIP

arXiv:2208.12419 [pdf, other]

Arbitrary Shape Text Detection via Segmentation with Probability Maps

Authors: Shi-Xue Zhang, Xiaobin Zhu, Lei Chen, Jie-Bo Hou, Xu-Cheng Yin

Abstract: Arbitrary shape text detection is a challenging task due to the significantly varied sizes and aspect ratios, arbitrary orientations or shapes, inaccurate annotations, etc. Due to the scalability of pixel-level prediction, segmentation-based methods can adapt to various shape texts and hence attracted considerable attention recently. However, accurate pixel-level annotations of texts are formidabl… ▽ More Arbitrary shape text detection is a challenging task due to the significantly varied sizes and aspect ratios, arbitrary orientations or shapes, inaccurate annotations, etc. Due to the scalability of pixel-level prediction, segmentation-based methods can adapt to various shape texts and hence attracted considerable attention recently. However, accurate pixel-level annotations of texts are formidable, and the existing datasets for scene text detection only provide coarse-grained boundary annotations. Consequently, numerous misclassified text pixels or background pixels inside annotations always exist, degrading the performance of segmentation-based text detection methods. Generally speaking, whether a pixel belongs to text or not is highly related to the distance with the adjacent annotation boundary. With this observation, in this paper, we propose an innovative and robust segmentation-based detection method via probability maps for accurately detecting text instances. To be concrete, we adopt a Sigmoid Alpha Function (SAF) to transfer the distances between boundaries and their inside pixels to a probability map. However, one probability map can not cover complex probability distributions well because of the uncertainty of coarse-grained text boundary annotations. Therefore, we adopt a group of probability maps computed by a series of Sigmoid Alpha Functions to describe the possible probability distributions. In addition, we propose an iterative model to learn to predict and assimilate probability maps for providing enough information to reconstruct text instances. Finally, simple region growth algorithms are adopted to aggregate probability maps to complete text instances. Experimental results demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy on several benchmarks. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Accepted by TPAMI 2022. arXiv admin note: text overlap with arXiv:1812.01393 by other authors

arXiv:2208.07137 [pdf, other]

An Empirical Study of Pseudo-Labeling for Image-based 3D Object Detection

Authors: Xinzhu Ma, Yuan Meng, Yinmin Zhang, Lei Bai, Jun Hou, Shuai Yi, Wanli Ouyang

Abstract: Image-based 3D detection is an indispensable component of the perception system for autonomous driving. However, it still suffers from the unsatisfying performance, one of the main reasons for which is the limited training data. Unfortunately, annotating the objects in the 3D space is extremely time/resource-consuming, which makes it hard to extend the training set arbitrarily. In this work, we fo… ▽ More Image-based 3D detection is an indispensable component of the perception system for autonomous driving. However, it still suffers from the unsatisfying performance, one of the main reasons for which is the limited training data. Unfortunately, annotating the objects in the 3D space is extremely time/resource-consuming, which makes it hard to extend the training set arbitrarily. In this work, we focus on the semi-supervised manner and explore the feasibility of a cheaper alternative, i.e. pseudo-labeling, to leverage the unlabeled data. For this purpose, we conduct extensive experiments to investigate whether the pseudo-labels can provide effective supervision for the baseline models under varying settings. The experimental results not only demonstrate the effectiveness of the pseudo-labeling mechanism for image-based 3D detection (e.g. under monocular setting, we achieve 20.23 AP for moderate level on the KITTI-3D testing set without bells and whistles, improving the baseline model by 6.03 AP), but also show several interesting and noteworthy findings (e.g. the models trained with pseudo-labels perform better than that trained with ground-truth annotations based on the same training data). We hope this work can provide insights for the image-based 3D detection community under a semi-supervised setting. The codes, pseudo-labels, and pre-trained models will be publicly available. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: tech report

arXiv:2208.05391 [pdf, other]

doi 10.1007/JHEP03(2023)243

$T\bar{T}$ flow as characteristic flows

Authors: Jue Hou

Abstract: We show that method of characteristics provides a powerful new point of view on $T\bar{T}$-and related deformations. Previously, the method of characteristics has been applied to $T\bar{T}$-deformation mainly to solve Burgers' equation, which governs the deformation of the \emph{quantum} spectrum. In the current work, we study \emph{classical} deformed quantities using this method and show that… ▽ More We show that method of characteristics provides a powerful new point of view on $T\bar{T}$-and related deformations. Previously, the method of characteristics has been applied to $T\bar{T}$-deformation mainly to solve Burgers' equation, which governs the deformation of the \emph{quantum} spectrum. In the current work, we study \emph{classical} deformed quantities using this method and show that $T\bar{T}$ flow can be seen as a characteristic flow. Exploiting this point of view, we re-derive a number of important known results and obtain interesting new ones. We prove the equivalence between dynamical change of coordinates and the generalized light-cone gauge approaches to $T\bar{T}$-deformation. We find the deformed Lagrangians for a class of $T\bar{T}$-like deformations in higher dimensions and the $(T\bar{T})^α$-deformation in 2d with generic $α$, generalizing recent results in arXiv:2206.03415 and arXiv:2206.10515. △ Less

Submitted 26 January, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

Comments: 38 pages, 2 figures, references updated

arXiv:2208.04053 [pdf, other]

Distributed Momentum-based Frank-Wolfe Algorithm for Stochastic Optimization

Authors: Jie Hou, Xianlin Zeng, Gang Wang, Jian Sun, Jie Chen

Abstract: This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The F… ▽ More This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov's momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is $\mathcal{O}(k^{-\frac{1}{2}})$ for convex optimization, and $\mathcal{O}(1/\mathrm{log}_2(k))$ for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 15 pages, 11 figures, 2 tables

arXiv:2208.03054 [pdf, other]

Global Pointer: Novel Efficient Span-based Approach for Named Entity Recognition

Authors: Jianlin Su, Ahmed Murtadha, Shengfeng Pan, **g Hou, Jun Sun, Wanwei Huang, Bo Wen, Yunfeng Liu

Abstract: Named entity recognition (NER) task aims at identifying entities from a piece of text that belong to predefined semantic types such as person, location, organization, etc. The state-of-the-art solutions for flat entities NER commonly suffer from capturing the fine-grained semantic information in underlying texts. The existing span-based approaches overcome this limitation, but the computation time… ▽ More Named entity recognition (NER) task aims at identifying entities from a piece of text that belong to predefined semantic types such as person, location, organization, etc. The state-of-the-art solutions for flat entities NER commonly suffer from capturing the fine-grained semantic information in underlying texts. The existing span-based approaches overcome this limitation, but the computation time is still a concern. In this work, we propose a novel span-based NER framework, namely Global Pointer (GP), that leverages the relative positions through a multiplicative attention mechanism. The ultimate goal is to enable a global view that considers the beginning and the end positions to predict the entity. To this end, we design two modules to identify the head and the tail of a given entity to enable the inconsistency between the training and inference processes. Moreover, we introduce a novel classification loss function to address the imbalance label problem. In terms of parameters, we introduce a simple but effective approximate method to reduce the training parameters. We extensively evaluate GP on various benchmark datasets. Our extensive experiments demonstrate that GP can outperform the existing solution. Moreover, the experimental results show the efficacy of the introduced loss function compared to softmax and entropy alternatives. △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2208.00794 [pdf, ps, other]

Generating non-jumps from a known one

Authors: Jianfeng Hou, Heng Li, Caihong Yang, Yixiao Zhang

Abstract: Let $r\ge 2$ be an integer. The real number $α\in [0,1]$ is a jump for $r$ if there exists a constant $c > 0$ such that for any $ε>0$ and any integer $m \geq r$, there exists an integer $n_0(ε, m)$ satisfying any $r$-uniform graph with $n\ge n_0(ε, m)$ vertices and density at least $α+ε$ contains a subgraph with $m$ vertices and density at least $α+c$. A result of Erdős, Stone and Simonovits impli… ▽ More Let $r\ge 2$ be an integer. The real number $α\in [0,1]$ is a jump for $r$ if there exists a constant $c > 0$ such that for any $ε>0$ and any integer $m \geq r$, there exists an integer $n_0(ε, m)$ satisfying any $r$-uniform graph with $n\ge n_0(ε, m)$ vertices and density at least $α+ε$ contains a subgraph with $m$ vertices and density at least $α+c$. A result of Erdős, Stone and Simonovits implies that every $α\in [0,1)$ is a jump for $r=2$. Erdős asked whether the same is true for $r\ge 3$. Frankl and Rödl gave a negative answer by showing that $1-\frac{1}{l^{r-1}}$ is not a jump for $r$ if $r\ge 3$ and $l>2r$. After that, more non-jumps are found using a method of Frankl and Rödl. In this note, we show a method to construct maps $f \colon [0,1] \to [0,1]$ that preserve non-jumps, if $α$ is a non-jump for $r$ given by the method of Frankl and Rödl, then $f(α)$ is also a non-jump for $r$. We use these maps to study hypergraph Turán densities and answer a question posed by Grosu. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.05483 [pdf, other]

doi 10.1109/TCSVT.2022.3208859

CorrI2P: Deep Image-to-Point Cloud Registration via Dense Correspondence

Authors: Siyu Ren, Yiming Zeng, Junhui Hou, Xiaodong Chen

Abstract: Motivated by the intuition that the critical step of localizing a 2D image in the corresponding 3D point cloud is establishing 2D-3D correspondence between them, we propose the first feature-based dense correspondence framework for addressing the image-to-point cloud registration problem, dubbed CorrI2P, which consists of three modules, i.e., feature embedding, symmetric overlap** region detecti… ▽ More Motivated by the intuition that the critical step of localizing a 2D image in the corresponding 3D point cloud is establishing 2D-3D correspondence between them, we propose the first feature-based dense correspondence framework for addressing the image-to-point cloud registration problem, dubbed CorrI2P, which consists of three modules, i.e., feature embedding, symmetric overlap** region detection, and pose estimation through the established correspondence. Specifically, given a pair of a 2D image and a 3D point cloud, we first transform them into high-dimensional feature space and feed the resulting features into a symmetric overlap** region detector to determine the region where the image and point cloud overlap each other. Then we use the features of the overlap** regions to establish the 2D-3D correspondence before running EPnP within RANSAC to estimate the camera's pose. Experimental results on KITTI and NuScenes datasets show that our CorrI2P outperforms state-of-the-art image-to-point cloud registration methods significantly. We will make the code publicly available. △ Less

Submitted 20 September, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE TCSVT

arXiv:2207.04266 [pdf, other]

Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image Denoising

Authors: **hui Hou, Zhiyu Zhu, Hui Liu, Junhui Hou

Abstract: This paper tackles the challenging problem of hyperspectral (HS) image denoising. Unlike existing deep learning-based methods usually adopting complicated network architectures or empirically stacking off-the-shelf modules to pursue performance improvement, we focus on the efficient and effective feature extraction manner for capturing the high-dimensional characteristics of HS images. To be speci… ▽ More This paper tackles the challenging problem of hyperspectral (HS) image denoising. Unlike existing deep learning-based methods usually adopting complicated network architectures or empirically stacking off-the-shelf modules to pursue performance improvement, we focus on the efficient and effective feature extraction manner for capturing the high-dimensional characteristics of HS images. To be specific, based on the theoretical analysis that increasing the rank of the matrix formed by the unfolded convolutional kernels can promote feature diversity, we propose rank-enhanced low-dimensional convolution set (Re-ConvSet), which separately performs 1-D convolution along the three dimensions of an HS image side-by-side, and then aggregates the resulting spatial-spectral embeddings via a learnable compression layer. Re-ConvSet not only learns the diverse spatial-spectral features of HS images, but also reduces the parameters and complexity of the network. We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method. Surprisingly, we observe such a concise framework outperforms the most recent method to a large extent in terms of quantitative metrics, visual results, and efficiency. We believe our work may shed light on deep learning-based HS image processing and analysis. △ Less

Submitted 9 July, 2022; originally announced July 2022.

Comments: 10 pages, 8 figures

arXiv:2207.03128 [pdf, other]

PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition

Authors: Qijian Zhang, Junhui Hou, Yue Qian

Abstract: As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike… ▽ More As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike multi-view image-based 2D visual modeling paradigms, which have shown leading performance in several common 3D shape recognition benchmarks, point cloud-based 3D geometric modeling paradigms are still highly limited by insufficient learning capacity, due to the difficulty of extracting discriminative features from irregular geometric signals. In this paper, we explore the possibility of boosting deep 3D point cloud encoders by transferring visual knowledge extracted from deep 2D image encoders under a standard teacher-student distillation workflow. Generally, we propose PointMCD, a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. To perform heterogeneous feature alignment between 2D visual and 3D geometric domains, we further investigate visibility-aware feature projection (VAFP), by which point-wise embeddings are reasonably aggregated into view-specific geometric descriptors. By pair-wisely aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification. Experiments on 3D shape classification, part segmentation, and unsupervised learning strongly validate the effectiveness of our method. The code and data will be publicly available at https://github.com/keeganhk/PointMCD. △ Less

Submitted 15 June, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Accepted to TMM

arXiv:2207.03105 [pdf]

doi 10.1088/1361-6560/ac9e3e

Uncertainty-Aware Self-supervised Neural Network for Liver $T_{1ρ}$ Map** with Relaxation Constraint

Authors: Chaoxing Huang, Yurui Qian, Simon Chun Ho Yu, Jian Hou, Baiyan Jiang, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

Abstract: $T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}… ▽ More $T_{1ρ}$ map** is a promising quantitative MRI technique for the non-invasive assessment of tissue properties. Learning-based approaches can map $T_{1ρ}$ from a reduced number of $T_{1ρ}$ weighted images, but requires significant amounts of high quality training data. Moreover, existing methods do not provide the confidence level of the $T_{1ρ}$ estimation. To address these problems, we proposed a self-supervised learning neural network that learns a $T_{1ρ}$ map** using the relaxation constraint in the learning process. Epistemic uncertainty and aleatoric uncertainty are modelled for the $T_{1ρ}$ quantification network to provide a Bayesian confidence estimation of the $T_{1ρ}$ map**. The uncertainty estimation can also regularize the model to prevent it from learning imperfect data. We conducted experiments on $T_{1ρ}$ data collected from 52 patients with non-alcoholic fatty liver disease. The results showed that our method outperformed the existing methods for $T_{1ρ}$ quantification of the liver using as few as two $T_{1ρ}$-weighted images. Our uncertainty estimation provided a feasible way of modelling the confidence of the self-supervised learning based $T_{1ρ}$ estimation, which is consistent with the reality in liver $T_{1ρ}$ imaging. △ Less

Submitted 25 October, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Comments: Provisionally accepted by Physics in Medicine and Biology

arXiv:2207.02595 [pdf, other]

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

Authors: Haoning Wu, Chaofeng Chen, **gwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and crop**. However, they obviously corrupt quality-… ▽ More Current deep video quality assessment (VQA) methods are usually with high computational costs when evaluating high-resolution videos. This cost hinders them from learning better video-quality-related representations via end-to-end training. Existing approaches typically consider naive sampling to reduce the computational cost, such as resizing and crop**. However, they obviously corrupt quality-related information in videos and are thus not optimal for learning good representations for VQA. Therefore, there is an eager need to design a new quality-retained sampling scheme for VQA. In this paper, we propose Grid Mini-patch Sampling (GMS), which allows consideration of local quality by sampling patches at their raw resolution and covers global quality with contextual relations via mini-patches sampled in uniform grids. These mini-patches are spliced and aligned temporally, named as fragments. We further build the Fragment Attention Network (FANet) specially designed to accommodate fragments as inputs. Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos. The newly learned video-quality-related representations can also be transferred into smaller VQA datasets, boosting performance in these scenarios. Extensive experiments show that FAST-VQA has good performance on inputs of various resolutions while retaining high efficiency. We publish our code at https://github.com/timothyhtimothy/FAST-VQA. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Comments: Will appear on ECCV 2022. 14 Pages

Journal ref: Proceedings of the European Conference on Computer Vision (ECCV) 2022

arXiv:2207.02466 [pdf, other]

GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation

Authors: Yifan Zhang, Qijian Zhang, Zhiyu Zhu, Junhui Hou, Yixuan Yuan

Abstract: The inherent ambiguity in ground-truth annotations of 3D bounding boxes, caused by occlusions, signal missing, or manual annotation errors, can confuse deep 3D object detectors during training, thus deteriorating detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. In this paper, we formulate the label uncertainty problem as the d… ▽ More The inherent ambiguity in ground-truth annotations of 3D bounding boxes, caused by occlusions, signal missing, or manual annotation errors, can confuse deep 3D object detectors during training, thus deteriorating detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. In this paper, we formulate the label uncertainty problem as the diversity of potentially plausible bounding boxes of objects. Then, we propose GLENet, a generative framework adapted from conditional variational autoencoders, to model the one-to-many relationship between a typical 3D object and its potential ground-truth bounding boxes with latent variables. The label uncertainty generated by GLENet is a plug-and-play module and can be conveniently integrated into existing deep 3D detectors to build probabilistic detectors and supervise the learning of the localization uncertainty. Besides, we propose an uncertainty-aware quality estimator architecture in probabilistic detectors to guide the training of the IoU-branch with predicted localization uncertainty. We incorporate the proposed methods into various popular base 3D detectors and demonstrate significant and consistent performance gains on both KITTI and Waymo benchmark datasets. Especially, the proposed GLENet-VR outperforms all published LiDAR-based approaches by a large margin and achieves the top rank among single-modal methods on the challenging KITTI test set. The source code and pre-trained models are publicly available at \url{https://github.com/Eaphan/GLENet}. △ Less

Submitted 2 June, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

arXiv:2207.01909 [pdf, other]

StyleFlow For Content-Fixed Image to Image Translation

Authors: Weichen Fan, **ghuan Chen, Jiabin Ma, Jun Hou, Shuai Yi

Abstract: Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performan… ▽ More Image-to-image (I2I) translation is a challenging topic in computer vision. We divide this problem into three tasks: strongly constrained translation, normally constrained translation, and weakly constrained translation. The constraint here indicates the extent to which the content or semantic information in the original image is preserved. Although previous approaches have achieved good performance in weakly constrained tasks, they failed to fully preserve the content in both strongly and normally constrained tasks, including photo-realism synthesis, style transfer, and colorization, etc. To achieve content-preserving transfer in strongly constrained and normally constrained tasks, we propose StyleFlow, a new I2I translation model that consists of normalizing flows and a novel Style-Aware Normalization (SAN) module. With the invertible network structure, StyleFlow first projects input images into deep feature space in the forward pass, while the backward pass utilizes the SAN module to perform content-fixed feature transformation and then projects back to image space. Our model supports both image-guided translation and multi-modal synthesis. We evaluate our model in several I2I translation benchmarks, and the results show that the proposed model has advantages over previous methods in both strongly constrained and normally constrained tasks. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2207.01758 [pdf, other]

FDVTS's Solution for 2nd COV19D Competition on COVID-19 Detection and Severity Analysis

Authors: Junlin Hou, Jilan Xu, Rui Feng, Yuejie Zhang

Abstract: This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). In our approach, we employ an effective 3D Contrastive Mixup Classification network for COVID-19 diagnosis on chest CT images, which is composed of contrastive representation learning and mixup classification. For the COVID-1… ▽ More This paper presents our solution for the 2nd COVID-19 Competition, occurring in the framework of the AIMIA Workshop in the European Conference on Computer Vision (ECCV 2022). In our approach, we employ an effective 3D Contrastive Mixup Classification network for COVID-19 diagnosis on chest CT images, which is composed of contrastive representation learning and mixup classification. For the COVID-19 detection challenge, our approach reaches 0.9245 macro F1 score on 484 validation CT scans, which significantly outperforms the baseline method by 16.5%. In the COVID-19 severity detection challenge, our approach achieves 0.7186 macro F1 score on 61 validation samples, which also surpasses the baseline by 8.86%. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.10157 [pdf, other]

Probing Visual-Audio Representation for Video Highlight Detection via Hard-Pairs Guided Contrastive Learning

Authors: Shuaicheng Li, Feng Zhang, Kunlin Yang, Lingbo Liu, Shinan Liu, Jun Hou, Shuai Yi

Abstract: Video highlight detection is a crucial yet challenging problem that aims to identify the interesting moments in untrimmed videos. The key to this task lies in effective video representations that jointly pursue two goals, \textit{i.e.}, cross-modal representation learning and fine-grained feature discrimination. In this paper, these two challenges are tackled by not only enriching intra-modality a… ▽ More Video highlight detection is a crucial yet challenging problem that aims to identify the interesting moments in untrimmed videos. The key to this task lies in effective video representations that jointly pursue two goals, \textit{i.e.}, cross-modal representation learning and fine-grained feature discrimination. In this paper, these two challenges are tackled by not only enriching intra-modality and cross-modality relations for representation modeling but also sha** the features in a discriminative manner. Our proposed method mainly leverages the intra-modality encoding and cross-modality co-occurrence encoding for fully representation modeling. Specifically, intra-modality encoding augments the modality-wise features and dampens irrelevant modality via within-modality relation learning in both audio and visual signals. Meanwhile, cross-modality co-occurrence encoding focuses on the co-occurrence inter-modality relations and selectively captures effective information among multi-modality. The multi-modal representation is further enhanced by the global information abstracted from the local context. In addition, we enlarge the discriminative power of feature embedding with a hard-pairs guided contrastive learning (HPCL) scheme. A hard-pairs sampling strategy is further employed to mine the hard samples for improving feature discrimination in HPCL. Extensive experiments conducted on two benchmarks demonstrate the effectiveness and superiority of our proposed methods compared to other state-of-the-art methods. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.10095 [pdf, other]

Pyramid Region-based Slot Attention Network for Temporal Action Proposal Generation

Authors: Shuaicheng Li, Feng Zhang, Rui-Wei Zhao, Rui Feng, Kunlin Yang, Lingbo Liu, Jun Hou

Abstract: It has been found that temporal action proposal generation, which aims to discover the temporal action instances within the range of the start and end frames in the untrimmed videos, can largely benefit from proper temporal and semantic context exploitation. The latest efforts were dedicated to considering the temporal context and similarity-based semantic contexts through self-attention modules.… ▽ More It has been found that temporal action proposal generation, which aims to discover the temporal action instances within the range of the start and end frames in the untrimmed videos, can largely benefit from proper temporal and semantic context exploitation. The latest efforts were dedicated to considering the temporal context and similarity-based semantic contexts through self-attention modules. However, they still suffer from cluttered background information and limited contextual feature learning. In this paper, we propose a novel Pyramid Region-based Slot Attention (PRSlot) module to address these issues. Instead of using the similarity computation, our PRSlot module directly learns the local relations in an encoder-decoder manner and generates the representation of a local region enhanced based on the attention over input features called \textit{slot}. Specifically, upon the input snippet-level features, PRSlot module takes the target snippet as \textit{query}, its surrounding region as \textit{key} and then generates slot representations for each \textit{query-key} slot by aggregating the local snippet context with a parallel pyramid strategy. Based on PRSlot modules, we present a novel Pyramid Region-based Slot Attention Network termed PRSA-Net to learn a unified visual representation with rich temporal and semantic context for better proposal generation. Extensive experiments are conducted on two widely adopted THUMOS14 and ActivityNet-1.3 benchmarks. Our PRSA-Net outperforms other state-of-the-art methods. In particular, we improve the AR@100 from the previous best 50.67% to 56.12% for proposal generation and raise the mAP under 0.5 tIoU from 51.9\% to 58.7\% for action detection on THUMOS14. \textit{Code is available at} \url{https://github.com/handhand123/PRSA-Net} △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.09853 [pdf, other]

DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment

Authors: Haoning Wu, Chaofeng Chen, Liang Liao, **gwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: The temporal relationships between frames and their influences on video quality assessment (VQA) are still under-studied in existing works. These relationships lead to two important types of effects for video quality. Firstly, some temporal variations (such as shaking, flicker, and abrupt scene transitions) are causing temporal distortions and lead to extra quality degradations, while other variat… ▽ More The temporal relationships between frames and their influences on video quality assessment (VQA) are still under-studied in existing works. These relationships lead to two important types of effects for video quality. Firstly, some temporal variations (such as shaking, flicker, and abrupt scene transitions) are causing temporal distortions and lead to extra quality degradations, while other variations (e.g. those related to meaningful happenings) do not. Secondly, the human visual system often has different attention to frames with different contents, resulting in their different importance to the overall video quality. Based on prominent time-series modeling ability of transformers, we propose a novel and effective transformer-based VQA method to tackle these two issues. To better differentiate temporal variations and thus capture the temporal distortions, we design a transformer-based Spatial-Temporal Distortion Extraction (STDE) module. To tackle with temporal quality attention, we propose the encoder-decoder-like temporal content transformer (TCT). We also introduce the temporal sampling on features to reduce the input length for the TCT, so as to improve the learning effectiveness and efficiency of this module. Consisting of the STDE and the TCT, the proposed Temporal Distortion-Content Transformers for Video Quality Assessment (DisCoVQA) reaches state-of-the-art performance on several VQA benchmarks without any extra pre-training datasets and up to 10% better generalization ability than existing methods. We also conduct extensive ablation experiments to prove the effectiveness of each part in our proposed model, and provide visualizations to prove that the proposed modules achieve our intention on modeling these temporal issues. We will publish our codes and pretrained weights later. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.06067 [pdf, other]

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

Authors: Zengyu Qiu, Xinzhu Ma, Kunlin Yang, Chunya Liu, Jun Hou, Shuai Yi, Wanli Ouyang

Abstract: Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the `prior knowledge' is vital to KD, especially when applying large teachers. Particular… ▽ More Knowledge distillation (KD) has shown very promising capabilities in transferring learning representations from large models (teachers) to small models (students). However, as the capacity gap between students and teachers becomes larger, existing KD methods fail to achieve better results. Our work shows that the `prior knowledge' is vital to KD, especially when applying large teachers. Particularly, we propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. This means that our method also takes the teacher's feature as `input', not just `target'. Besides, we dynamically adjust the ratio of the prior knowledge during the training phase according to the feature gap, thus guiding the student in an appropriate difficulty. To evaluate the proposed method, we conduct extensive experiments on two image classification benchmarks (i.e. CIFAR100 and ImageNet) and an object detection benchmark (i.e. MS COCO. The results demonstrate the superiority of our method in performance under varying settings. Besides, our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers. More importantly, DPK provides a fast solution in teacher model selection for any given model. △ Less

Submitted 23 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: ICLR'23 accepted

arXiv:2206.04904 [pdf, other]

doi 10.3847/1538-3881/ac77fa

New insights into the structure of open clusters in the Gaia era

Authors: **g Zhong, Li Chen, Yueyue Jiang, Songmei Qin, **liang Hou

Abstract: With the help of Gaia data, it is noted that in addition to the core components, there are low-density outer halo components in the extended region of open clusters. To study the extended structure beyond the core radius of the cluster ($\sim$ 10 pc), based on Gaia EDR3 data, taking up to 50 pc as the searching radius, we use the pyUPMASK algorithm to re-determine the member stars of the open clus… ▽ More With the help of Gaia data, it is noted that in addition to the core components, there are low-density outer halo components in the extended region of open clusters. To study the extended structure beyond the core radius of the cluster ($\sim$ 10 pc), based on Gaia EDR3 data, taking up to 50 pc as the searching radius, we use the pyUPMASK algorithm to re-determine the member stars of the open cluster within 1-2 kpc. We obtain the member stars of 256 open clusters, especially those located in the outer halo region of open clusters. Furthermore, we find that most open clusters' radial density profile in the outer region deviates from the King's profile. To better describe the internal and external structural characteristics of open clusters, we propose a double components model for description: core components with King model distribution and outer halo components with logarithmic Gaussian distribution, and then suggest using four radii ( $r_c$, $r_t$, $r_o$, $r_e$) for describing the structure and distribution profile of star clusters, where $r_t$ and $r_e$ represent the boundaries of core components and outer halo components respectively. Finally, we provide a catalog of 256 clusters with structural parameters. In addition, our study shows the sizes of these radii are statistically linear related, which indicates that the inner and outer regions of the cluster are interrelated and follow similar evolutionary processes. Further, we show that the structure of two components can be used to better trace the cluster evolution properties in different stages. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 17 pages, 6 figures. Accepted for publication in AJ

Showing 201–250 of 752 results for author: Hou, J