-
A Multi-Modal Neural Geometric Solver with Textual Clauses Parsed from Diagram
Authors:
Ming-Liang Zhang,
Fei Yin,
Cheng-Lin Liu
Abstract:
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neura…
▽ More
Geometry problem solving (GPS) is a high-level mathematical reasoning requiring the capacities of multi-modal fusion and geometric knowledge application. Recently, neural solvers have shown great potential in GPS but still be short in diagram presentation and modal fusion. In this work, we convert diagrams into basic textual clauses to describe diagram features effectively, and propose a new neural solver called PGPSNet to fuse multi-modal information efficiently. Combining structural and semantic pre-training, data augmentation and self-limited decoding, PGPSNet is endowed with rich knowledge of geometry theorems and geometric representation, and therefore promotes geometric understanding and reasoning. In addition, to facilitate the research of GPS, we build a new large-scale and fine-annotated GPS dataset named PGPS9K, labeled with both fine-grained diagram annotation and interpretable solution program. Experiments on PGPS9K and an existing dataset Geometry3K validate the superiority of our method over the state-of-the-art neural solvers. Our code, dataset and appendix material are available at \url{https://github.com/mingliangzhang2018/PGPS}.
△ Less
Submitted 28 April, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Towards Flexibility and Interpretability of Gaussian Process State-Space Model
Authors:
Zhid Lin,
Feng Yin,
Juan Maroñas
Abstract:
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a ne…
▽ More
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a new class of probabilistic state-space models called TGPSSMs, which leverage a parametric normalizing flow to enrich the GP priors in the standard GPSSM, enabling greater flexibility and expressivity. Additionally, we present a scalable variational inference algorithm that offers a flexible and optimal structure for the variational distribution of latent states. The proposed algorithm is interpretable and computationally efficient due to the sparse GP representation and the bijective nature of normalizing flow. Moreover, we incorporate a constrained optimization framework into the algorithm to enhance the state-space representation capabilities and optimize the hyperparameters, leading to superior learning and inference performance. Experimental results on synthetic and real datasets corroborate that the proposed TGPSSM outperforms several state-of-the-art methods. The accompanying source code is available at \url{https://github.com/zhidilin/TGPSSM}.
△ Less
Submitted 6 April, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
Authors:
Chongshan Lu,
Fukun Yin,
Xin Chen,
Tao Chen,
Gang YU,
Jiayuan Fan
Abstract:
Neural Radiance Fields (NeRF) has achieved impressive results in single object scene reconstruction and novel view synthesis, which have been demonstrated on many single modality and single object focused indoor scene datasets like DTU, BMVS, and NeRF Synthetic.However, the study of NeRF on large-scale outdoor scene reconstruction is still limited, as there is no unified outdoor scene dataset for…
▽ More
Neural Radiance Fields (NeRF) has achieved impressive results in single object scene reconstruction and novel view synthesis, which have been demonstrated on many single modality and single object focused indoor scene datasets like DTU, BMVS, and NeRF Synthetic.However, the study of NeRF on large-scale outdoor scene reconstruction is still limited, as there is no unified outdoor scene dataset for large-scale NeRF evaluation due to expensive data acquisition and calibration costs. In this paper, we propose a large-scale outdoor multi-modal dataset, OMMO dataset, containing complex land objects and scenes with calibrated images, point clouds and prompt annotations. Meanwhile, a new benchmark for several outdoor NeRF-based tasks is established, such as novel view synthesis, surface reconstruction, and multi-modal NeRF. To create the dataset, we capture and collect a large number of real fly-view videos and select high-quality and high-resolution clips from them. Then we design a quality review module to refine images, remove low-quality frames and fail-to-calibrate scenes through a learning-based automatic evaluation plus manual review. Finally, a number of volunteers are employed to add the text descriptions for each scene and key-frame to meet the potential multi-modal requirements in the future. Compared with existing NeRF datasets, our dataset contains abundant real-world urban and natural scenes with various scales, camera trajectories, and lighting conditions. Experiments show that our dataset can benchmark most state-of-the-art NeRF methods on different tasks. We will release the dataset and model weights very soon.
△ Less
Submitted 17 January, 2023;
originally announced January 2023.
-
Generalizable Black-Box Adversarial Attack with Meta Learning
Authors:
Fei Yin,
Yong Zhang,
Baoyuan Wu,
Yan Feng,
**gyi Zhang,
Yanbo Fan,
Yujiu Yang
Abstract:
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize t…
▽ More
In the scenario of black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful adversarial perturbation based on query feedback under a query budget. Due to the limited feedback information, existing query-based black-box attack methods often require many queries for attacking each benign example. To reduce query cost, we propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. Specifically, by treating the attack on each benign example as one task, we develop a meta-learning framework by training a meta-generator to produce perturbations conditioned on benign examples. When attacking a new benign example, the meta generator can be quickly fine-tuned based on the feedback information of the new task as well as a few historical attacks to produce effective perturbations. Moreover, since the meta-train procedure consumes many queries to learn a generalizable generator, we utilize model-level adversarial transferability to train the meta-generator on a white-box surrogate model, then transfer it to help the attack against the target model. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance, which is verified by extensive experiments.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Output-Dependent Gaussian Process State-Space Model
Authors:
Zhidi Lin,
Lei Cheng,
Feng Yin,
Lexi Xu,
Shuguang Cui
Abstract:
Gaussian process state-space model (GPSSM) is a fully probabilistic state-space model that has attracted much attention over the past decade. However, the outputs of the transition function in the existing GPSSMs are assumed to be independent, meaning that the GPSSMs cannot exploit the inductive biases between different outputs and lose certain model capacities. To address this issue, this paper p…
▽ More
Gaussian process state-space model (GPSSM) is a fully probabilistic state-space model that has attracted much attention over the past decade. However, the outputs of the transition function in the existing GPSSMs are assumed to be independent, meaning that the GPSSMs cannot exploit the inductive biases between different outputs and lose certain model capacities. To address this issue, this paper proposes an output-dependent and more realistic GPSSM by utilizing the well-known, simple yet practical linear model of coregionalization (LMC) framework to represent the output dependency. To jointly learn the output-dependent GPSSM and infer the latent states, we propose a variational sparse GP-based learning method that only gently increases the computational complexity. Experiments on both synthetic and real datasets demonstrate the superiority of the output-dependent GPSSM in terms of learning and inference performance.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
3D GAN Inversion with Facial Symmetry Prior
Authors:
Fei Yin,
Yong Zhang,
Xuan Wang,
Tengfei Wang,
Xiaoyu Li,
Yuan Gong,
Yanbo Fan,
Xiaodong Cun,
Ying Shan,
Cengiz Oztireli,
Yujiu Yang
Abstract:
Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, recons…
▽ More
Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power of neural rendering. It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion. Although with the facial prior preserved in pre-trained 3D GANs, reconstructing a 3D portrait with only one monocular image is still an ill-pose problem. The straightforward application of 2D GAN inversion methods focuses on texture similarity only while ignoring the correctness of 3D geometry shapes. It may raise geometry collapse effects, especially when reconstructing a side face under an extreme pose. Besides, the synthetic results in novel views are prone to be blurry. In this work, we propose a novel method to promote 3D GAN inversion by introducing facial symmetry prior. We design a pipeline and constraints to make full use of the pseudo auxiliary view obtained via image flip**, which helps obtain a robust and reasonable geometry shape during the inversion process. To enhance texture fidelity in unobserved viewpoints, pseudo labels from depth-guided 3D war** can provide extra supervision. We design constraints aimed at filtering out conflict areas for optimization in asymmetric situations. Comprehensive quantitative and qualitative evaluations on image reconstruction and editing demonstrate the superiority of our method.
△ Less
Submitted 14 March, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Authors:
Kun Cheng,
Xiaodong Cun,
Yong Zhang,
Menghan Xia,
Fei Yin,
Mingrui Zhu,
Xuan Wang,
Jue Wang,
Nannan Wang
Abstract:
We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-r…
▽ More
We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention. Furthermore, our system is a generic approach that does not need to be retrained to a specific person. Evaluations on two widely-used datasets and in-the-wild examples demonstrate the superiority of our framework over other state-of-the-art methods in terms of lip-sync accuracy and visual quality.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
A room-temperature electrical-field-enhanced ultrafast switch in organic microcavity polariton condensates
Authors:
Jianbo De,
Xuekai Ma,
Fan Yin,
Jiahuan Ren,
Jiannian Yao,
Stefan Schumacher,
Qing Liao,
Hongbing Fu,
Guillaume Malpuech,
Dmitry Solnyshkov
Abstract:
Integrated electro-optical switches are essential as one of the fundamental elements in the development of modern optoelectronics. As an architecture for photonic systems, exciton polaritons, that are hybrid bosonic quasiparticles that possess unique properties derived from both excitons and photons, have shown much promise. For this system, we demonstrate a significant improvement of emitted inte…
▽ More
Integrated electro-optical switches are essential as one of the fundamental elements in the development of modern optoelectronics. As an architecture for photonic systems, exciton polaritons, that are hybrid bosonic quasiparticles that possess unique properties derived from both excitons and photons, have shown much promise. For this system, we demonstrate a significant improvement of emitted intensity and condensation threshold by applying an electric field to a microcavity filled with an organic microbelt. Our theoretical investigations indicate that the electric field makes the excitons dipolar and induces an enhancement of the exciton-polariton interaction and of the polariton lifetime. Based on these electric field induced changes, a sub-nanosecond electrical-field-enhanced polariton condensate switch is realized at room temperature, providing the basis for develo** an on-chip integrated photonic device in the strong light-matter coupling regime.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
MetaLoc: Learning to Learn Wireless Localization
Authors:
Jun Gao,
Dongze Wu,
Feng Yin,
Qinglei Kong,
Lexi Xu,
Shuguang Cui
Abstract:
Existing localization methods that intensively leverage the environment-specific received signal strength (RSS) or channel state information (CSI) of wireless signals are rather accurate in certain environments. However, these methods, whether based on pure statistical signal processing or data-driven approaches, often struggle to generalize to new environments, which results in considerable time…
▽ More
Existing localization methods that intensively leverage the environment-specific received signal strength (RSS) or channel state information (CSI) of wireless signals are rather accurate in certain environments. However, these methods, whether based on pure statistical signal processing or data-driven approaches, often struggle to generalize to new environments, which results in considerable time and effort being wasted. To address this challenge, we propose MetaLoc, which is the first fingerprinting-based localization framework that leverages the Model-Agnostic Meta-Learning (MAML). Specifically, built on a deep neural network with strong representation capabilities, MetaLoc is trained on historical data sourced from well-calibrated environments, employing a two-loop optimization mechanism to obtain the meta-parameters. These meta-parameters act as the initialization for quick adaptation in new environments, reducing the need for much human effort. The framework introduces two paradigms for the optimization of meta-parameters: a centralized paradigm that simplifies the process by sharing data from all historical environments, and a distributed paradigm that maintains data privacy by training meta-parameters for each specific environment separately. Furthermore, the advanced distributed paradigm modifies the vanilla MAML loss function to ensure that the reduction of loss occurs in a consistent direction across various training domains, thus facilitating faster convergence during training. Our experiments on both synthetic and real datasets demonstrate that MetaLoc outperforms baseline methods in terms of localization accuracy, robustness, and cost-effectiveness. The code and datasets used in this study are publicly available.
△ Less
Submitted 29 August, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation
Authors:
Fan Yin,
Yao Li,
Cho-Jui Hsieh,
Kai-Wei Chang
Abstract:
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-based adversarial attacks in NLP stop once model pr…
▽ More
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks and has drawn increasing attention from the Natural Language Processing (NLP) community. Despite the surge of new AED methods, our studies show that existing methods heavily rely on a shortcut to achieve good performance. In other words, current search-based adversarial attacks in NLP stop once model predictions change, and thus most adversarial examples generated by those attacks are located near model decision boundaries. To surpass this shortcut and fairly evaluate AED methods, we propose to test AED methods with \textbf{F}ar \textbf{B}oundary (\textbf{FB}) adversarial examples. Existing methods show worse than random guess performance under this scenario. To overcome this limitation, we propose a new technique, \textbf{ADDMU}, \textbf{a}dversary \textbf{d}etection with \textbf{d}ata and \textbf{m}odel \textbf{u}ncertainty, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 \emph{AUC} points under each scenario. Finally, our analysis shows that the two types of uncertainty provided by \textbf{ADDMU} can be leveraged to characterize adversarial examples and identify the ones that contribute most to model's robustness in adversarial training.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations
Authors:
Fukun Yin,
Wen Liu,
Zilong Huang,
Pei Cheng,
Tao Chen,
Gang YU
Abstract:
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation. However, existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views (i.e. 50-150) to obtain decent result…
▽ More
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation. However, existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views (i.e. 50-150) to obtain decent results. To relive the over-dependence on massive calibrated images and enrich the coordinate-based feature representation, we explore injecting the prior information into the coordinate-based network and introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation. The cores of our method are two attention modules: codebook attention and coordinate attention. The former extracts the useful prototypes containing rich geometry and appearance information from the prior codebook, and the latter propagates such prior information into each coordinate and enriches its feature representation for a scene or object surface. With the help of the prior information, our method can render 3D views with more photo-realistic appearance and geometries than the current methods using fewer calibrated images available. Experiments on various scene reconstruction datasets, including DTU and BlendedMVS, and the full 3D head reconstruction dataset, H3DS, demonstrate the robustness under fewer input views and fine detail-preserving capability of our proposed method.
△ Less
Submitted 21 October, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Quantifying U-Net Uncertainty in Multi-Parametric MRI-based Glioma Segmentation by Spherical Image Projection
Authors:
Zhenyu Yang,
Kyle Lafata,
Eugene Vaios,
Zongsheng Hu,
Trey Mullikin,
Fang-Fang Yin,
Chunhao Wang
Abstract:
The projection of planar MRI data onto a spherical surface is equivalent to a nonlinear image transformation that retains global anatomical information. By incorporating this image transformation process in our proposed spherical projection-based U-Net (SPU-Net) segmentation model design, multiple independent segmentation predictions can be obtained from a single MRI. The final segmentation is the…
▽ More
The projection of planar MRI data onto a spherical surface is equivalent to a nonlinear image transformation that retains global anatomical information. By incorporating this image transformation process in our proposed spherical projection-based U-Net (SPU-Net) segmentation model design, multiple independent segmentation predictions can be obtained from a single MRI. The final segmentation is the average of all available results, and the variation can be visualized as a pixel-wise uncertainty map. An uncertainty score was introduced to evaluate and compare the performance of uncertainty measurements. The proposed SPU-Net model was implemented on the basis of 369 glioma patients with MP-MRI scans (T1, T1-Ce, T2, and FLAIR). Three SPU-Net models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT), respectively. The SPU-Net model was compared with (1) the classic U-Net model with test-time augmentation (TTA) and (2) linear scaling-based U-Net (LSU-Net) segmentation models in terms of both segmentation accuracy (Dice coefficient, sensitivity, specificity, and accuracy) and segmentation uncertainty (uncertainty map and uncertainty score). The developed SPU-Net model successfully achieved low uncertainty for correct segmentation predictions (e.g., tumor interior or healthy tissue interior) and high uncertainty for incorrect results (e.g., tumor boundaries). This model could allow the identification of missed tumor targets or segmentation errors in U-Net. Quantitatively, the SPU-Net model achieved the highest uncertainty scores for three segmentation targets (ET/TC/WT): 0.826/0.848/0.936, compared to 0.784/0.643/0.872 using the U-Net with TTA and 0.743/0.702/0.876 with the LSU-Net (scaling factor = 2). The SPU-Net also achieved statistically significantly higher Dice coefficients, underscoring the improved segmentation accuracy.
△ Less
Submitted 12 August, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Distributed Scaled Proximal ADMM Algorithms for Cooperative Localization in WSNs
Authors:
Mei Zhang,
Zhiguo Wang,
Feng Yin,
Xiao**g Shen
Abstract:
Distributed cooperative localization in wireless networks is a challenging problem since it typically requires solving a large-scale nonconvex and nonsmooth optimization problem. In this paper, we reformulate the classic cooperative localization problem as a smooth and constrained nonconvex minimization problem while its loss function is separable over nodes. By utilizing the structure of the refo…
▽ More
Distributed cooperative localization in wireless networks is a challenging problem since it typically requires solving a large-scale nonconvex and nonsmooth optimization problem. In this paper, we reformulate the classic cooperative localization problem as a smooth and constrained nonconvex minimization problem while its loss function is separable over nodes. By utilizing the structure of the reformulation, we propose two novel scaled proximal alternating direction method of multipliers (SP-ADMM) algorithms, which can be implemented in a distributed manner. Compared with the classic semi-definite programming relaxation techniques, the proposed algorithms can provide more accurate position estimates with significantly lower computation complexity. The associated theoretical analysis shows that our algorithms {\blue globally converge to a KKT point} of the reformulated problem and a critical point of the original problem, with a favorable sublinear $\mathcal{O}\left(1/T\right)$ convergence rate, where $T$ is the iteration counter. Numerical experiments have consistently shown that the proposed SP-ADMM algorithms are superior to state-of-the-art methods in terms of localization accuracy and computational complexity across all tested scenarios, varying network size, number of anchors, average number of neighbors, and noise variance levels.
△ Less
Submitted 7 August, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Block-transitive $3$-$(v,k,1)$ designs associated with alternating groups
Authors:
Ting Lan,
Weijun Liu,
Fu-Gang Yin
Abstract:
Let $\mathcal{D}$ be a nontrivial $3$-$(v,k,1)$ design admitting a block-transitive group $G$ of automorphisms. A recent work of Gan and the second author asserts that $G$ is either affine or almost simple. In this paper, it is proved that if $G$ is almost simple with socle an alternating group, then $\mathcal{D}$ is the unique $3$-$(10,4,1)$ design, and $G=\mathrm{PGL}(2,9)$, $\mathrm{M}_{10}$ or…
▽ More
Let $\mathcal{D}$ be a nontrivial $3$-$(v,k,1)$ design admitting a block-transitive group $G$ of automorphisms. A recent work of Gan and the second author asserts that $G$ is either affine or almost simple. In this paper, it is proved that if $G$ is almost simple with socle an alternating group, then $\mathcal{D}$ is the unique $3$-$(10,4,1)$ design, and $G=\mathrm{PGL}(2,9)$, $\mathrm{M}_{10}$ or $\mathrm{Aut}(\mathrm{A}_6 )=\mathrm{S}_6:\mathrm{Z}_2$, and $G$ is flag-transitive.
△ Less
Submitted 15 May, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Two-geodesic transitive graphs of order $p^n$ with $n\leq3$
Authors:
Jun-Jie Huang,
Yan-Quan Feng,
**-Xin Zhou,
Fu-Gang Yin
Abstract:
A vertex triple $(u,v,w)$ of a graph is called a $2$-geodesic if $v$ is adjacent to both $u$ and $w$ and $u$ is not adjacent to $w$. A graph is said to be $2$-geodesic transitive if its automorphism group is transitive on the set of $2$-geodesics. In this paper, a complete classification of $2$-geodesic transitive graphs of order $p^n$ is given for each prime $p$ and $n\leq 3$. It turns out that a…
▽ More
A vertex triple $(u,v,w)$ of a graph is called a $2$-geodesic if $v$ is adjacent to both $u$ and $w$ and $u$ is not adjacent to $w$. A graph is said to be $2$-geodesic transitive if its automorphism group is transitive on the set of $2$-geodesics. In this paper, a complete classification of $2$-geodesic transitive graphs of order $p^n$ is given for each prime $p$ and $n\leq 3$. It turns out that all such graphs consist of three small graphs: the complete bipartite graph $K_{4,4}$ of order $8$, the Schläfli graph of order $27$ and its complement, and fourteen infinite families: the cycles $C_p, C_{p^2}$ and $C_{p^3}$, the complete graphs $K_p, K_{p^2}$ and $K_{p^3}$, the complete multipartite graphs $K_{p[p]}$, $K_{p[p^2]}$ and $K_{p^2[p]}$, the Hamming graph $H(2,p)$ and its complement, the Hamming graph $H(3,p)$, and two infinite families of normal Cayley graphs on extraspecial group of order $p^3$ and exponent $p$.
△ Less
Submitted 26 July, 2022; v1 submitted 22 July, 2022;
originally announced July 2022.
-
MobileCodec: Neural Inter-frame Video Compression on Mobile Devices
Authors:
Hoang Le,
Liang Zhang,
Amir Said,
Guillaume Sautiere,
Yang Yang,
Pranav Shrestha,
Fei Yin,
Reza Pourreza,
Auke Wiggers
Abstract:
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time opera…
▽ More
Realizing the potential of neural video codecs on mobile devices is a big technological challenge due to the computational complexity of deep networks and the power-constrained mobile hardware. We demonstrate practical feasibility by leveraging Qualcomm's technology and innovation, bridging the gap from neural network-based codec simulations running on wall-powered workstations, to real-time operation on a mobile device powered by Snapdragon technology. We show the first-ever inter-frame neural video decoder running on a commercial mobile phone, decoding high-definition videos in real-time while maintaining a low bitrate and high visual quality.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Learning Quality-aware Dynamic Memory for Video Object Segmentation
Authors:
Yong Liu,
Ran Yu,
Fei Yin,
Xinyuan Zhao,
Wei Zhao,
Weihao Xia,
Yujiu Yang
Abstract:
Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be…
▽ More
Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to improve the practicability of the models. Without any bells and whistles, our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks. Moreover, extensive experiments demonstrate that the proposed Quality Assessment Module (QAM) can be applied to memory-based methods as generic plugins and significantly improves performance. Our source code is available at https://github.com/workforai/QDMN.
△ Less
Submitted 16 July, 2022;
originally announced July 2022.
-
Theoretical and experimental study on Noise Equivalent Power of X-ray semiconductor ultra-fast response material based on the rad-optic effect
Authors:
Xin Yan,
Tao Wang,
Gang Wang,
Dong Yao,
Yiheng Liu,
Guilong Gao,
Liwei Xin,
Fei Yin,
**shou Tian,
Xinlong Chang,
Kai He
Abstract:
Semiconductor material based on the rad-optic effect enables ultra-fast detection of X-rays and plays an important role in fusion diagnostics. Obtaining the accurate noise equivalent power (NEP) of the semiconductor ultrafast response material is the key to detecting X-rays. In this paper, the refractive index change mechanism of the semiconductor under X-ray irradiation was analyzed, and the quan…
▽ More
Semiconductor material based on the rad-optic effect enables ultra-fast detection of X-rays and plays an important role in fusion diagnostics. Obtaining the accurate noise equivalent power (NEP) of the semiconductor ultrafast response material is the key to detecting X-rays. In this paper, the refractive index change mechanism of the semiconductor under X-ray irradiation was analyzed, and the quantitative relationship between the diffraction efficiency and the X-ray photon energy was established through the LT-AlGaAs diffraction imaging experiments. The impulse responses of LT-AlGaAs under 1 KeV-10 KeV X-ray radiation were calculated, revealing the variation of NEP density with radiated photon energy. In the case of bombarding the Al target to generate 1.5 KeV X-rays, the imaging experiments of LT-AlGaAs were performed. The diffraction image of LT-AlGaAs has a linear relationship with the radiation intensity, and the NEP density of LT-AlGaAs reaches 4.80*105W/cm2. This study has reference significance for the development of ultra-fast X-ray imaging systems based on the rad-optic effect.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
The smallest vertex-primitive $2$-arc-transitive digraph
Authors:
Fu-Gang Yin,
Yan-quan Feng,
Binzhou Xia
Abstract:
In 2017, Giudici, Li and the third author constructed the first known family of vertex-primitive $2$-arc-transitive digraphs of valency at least $2$. The smallest digraph in this family admits $\mathrm{PSL}_3(49)$ acting $2$-arc-transitively with vertex-stabilizer $\mathrm{A}_6$ and hence has $30758154560$ vertices. In this paper, we prove that this digraph is the vertex-primitive $2$-arc-transiti…
▽ More
In 2017, Giudici, Li and the third author constructed the first known family of vertex-primitive $2$-arc-transitive digraphs of valency at least $2$. The smallest digraph in this family admits $\mathrm{PSL}_3(49)$ acting $2$-arc-transitively with vertex-stabilizer $\mathrm{A}_6$ and hence has $30758154560$ vertices. In this paper, we prove that this digraph is the vertex-primitive $2$-arc-transitive digraph of valency at least $2$ with fewest vertices.
△ Less
Submitted 21 March, 2023; v1 submitted 13 June, 2022;
originally announced June 2022.
-
Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling
Authors:
Lei Cheng,
Feng Yin,
Sergios Theodoridis,
Sotirios Chatzis,
Tsung-Hui Chang
Abstract:
Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can…
▽ More
Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can better exploit related prior information and naturally introduce robustness into the model, due to their unique capacity to marginalize out uncertainties related to the parameter estimates. Moreover, hyper-parameters associated with the adopted priors can be learnt via the training data. To implement sparsity-aware learning, the crucial point lies in the choice of the function regularizer for discriminative methods and the choice of the prior distribution for Bayesian learning. Over the last decade or so, due to the intense research on deep learning, emphasis has been put on discriminative techniques. However, a come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models and inspire new paths for unsupervised learning, such as Bayesian tensor decomposition.
The goal of this article is two-fold. First, to review, in a unified way, some recent advances in incorporating sparsity-promoting priors into three highly popular data modeling tools, namely deep neural networks, Gaussian processes, and tensor decomposition. Second, to review their associated inference techniques from different aspects, including: evidence maximization via optimization and variational inference methods. Challenges such as small data dilemma, automatic model structure search, and natural prediction uncertainty evaluation are also discussed. Typical signal processing and machine learning tasks are demonstrated.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
PGDP5K: A Diagram Parsing Dataset for Plane Geometry Problems
Authors:
Yihan Hao,
Mingliang Zhang,
Fei Yin,
Linlin Huang
Abstract:
Diagram parsing is an important foundation for geometry problem solving, attracting increasing attention in the field of intelligent education and document image understanding. Due to the complex layout and between-primitive relationship, plane geometry diagram parsing (PGDP) is still a challenging task deserving further research and exploration. An appropriate dataset is critical for the research…
▽ More
Diagram parsing is an important foundation for geometry problem solving, attracting increasing attention in the field of intelligent education and document image understanding. Due to the complex layout and between-primitive relationship, plane geometry diagram parsing (PGDP) is still a challenging task deserving further research and exploration. An appropriate dataset is critical for the research of PGDP. Although some datasets with rough annotations have been proposed to solve geometric problems, they are either small in scale or not publicly available. The rough annotations also make them not very useful. Thus, we propose a new large-scale geometry diagram dataset named PGDP5K and a novel annotation method. Our dataset consists of 5000 diagram samples composed of 16 shapes, covering 5 positional relations, 22 symbol types and 6 text types. Different from previous datasets, our PGDP5K dataset is labeled with more fine-grained annotations at primitive level, including primitive classes, locations and relationships. What is more, combined with above annotations and geometric prior knowledge, it can generate intelligible geometric propositions automatically and uniquely. We performed experiments on PGDP5K and IMP-Geometry3K datasets reveal that the state-of-the-art (SOTA) method achieves only 66.07% F1 value. This shows that PGDP5K presents a challenge for future research. Our dataset is available at http://www.nlpr.ia.ac.cn/databases/CASIA-PGDP5K/.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Plane Geometry Diagram Parsing
Authors:
Ming-Liang Zhang,
Fei Yin,
Yi-Han Hao,
Cheng-Lin Liu
Abstract:
Geometry diagram parsing plays a key role in geometry problem solving, wherein the primitive extraction and relation parsing remain challenging due to the complex layout and between-primitive relationship. In this paper, we propose a powerful diagram parser based on deep learning and graph reasoning. Specifically, a modified instance segmentation method is proposed to extract geometric primitives,…
▽ More
Geometry diagram parsing plays a key role in geometry problem solving, wherein the primitive extraction and relation parsing remain challenging due to the complex layout and between-primitive relationship. In this paper, we propose a powerful diagram parser based on deep learning and graph reasoning. Specifically, a modified instance segmentation method is proposed to extract geometric primitives, and the graph neural network (GNN) is leveraged to realize relation parsing and primitive classification incorporating geometric features and prior knowledge. All the modules are integrated into an end-to-end model called PGDPNet to perform all the sub-tasks simultaneously. In addition, we build a new large-scale geometry diagram dataset named PGDP5K with primitive level annotations. Experiments on PGDP5K and an existing dataset IMP-Geometry3K show that our model outperforms state-of-the-art methods in four sub-tasks remarkably. Our code, dataset and appendix material are available at https://github.com/mingliangzhang2018/PGDP.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Constructing Trajectory and Predicting Estimated Time of Arrival for Long Distance Travelling Vessels: A Probability Density-based Scanning Approach
Authors:
Deqing Zhai,
Xiuju Fu,
Xiao Feng Yin,
Haiyan Xu,
Wanbing Zhang,
Ning Li
Abstract:
In this study, a probability density-based approach for constructing trajectories is proposed and validated through an typical use-case application: Estimated Time of Arrival (ETA) prediction given origin-destination pairs. The ETA prediction is based on physics and mathematical laws given by the extracted information of probability density-based trajectories constructed. The overall ETA predictio…
▽ More
In this study, a probability density-based approach for constructing trajectories is proposed and validated through an typical use-case application: Estimated Time of Arrival (ETA) prediction given origin-destination pairs. The ETA prediction is based on physics and mathematical laws given by the extracted information of probability density-based trajectories constructed. The overall ETA prediction errors are about 0.106 days (i.e. 2.544 hours) on average with 0.549 days (i.e. 13.176 hours) standard deviation, and the proposed approach has an accuracy of 92.08% with 0.959 R-Squared value for overall trajectories between Singapore and Australia ports selected.
△ Less
Submitted 13 May, 2022;
originally announced May 2022.
-
Multichannel Optimal Tree-Decodable Codes are Not Always Optimal Prefix Codes
Authors:
Hoover H. F. Yin,
Harry W. H. Wong,
Mehrdad Tahernia,
Russell W. F. Lai
Abstract:
The theory of multichannel prefix codes aims to generalize the classical theory of prefix codes. Although single- and two-channel prefix codes always have decoding trees, the same cannot be said when there are more than two channels. One question is of theoretical interest: Do there exist optimal tree-decodable codes that are not optimal prefix codes? Existing literature, which focused on generali…
▽ More
The theory of multichannel prefix codes aims to generalize the classical theory of prefix codes. Although single- and two-channel prefix codes always have decoding trees, the same cannot be said when there are more than two channels. One question is of theoretical interest: Do there exist optimal tree-decodable codes that are not optimal prefix codes? Existing literature, which focused on generalizing single-channel results, covered little about non-tree-decodable prefix codes since they have no single-channel counterparts. In this work, we study the fundamental reason behind the non-tree-decodability of prefix codes. By investigating the simplest non-tree-decodable structure, we obtain a general sufficient condition on the channel alphabets for the existence of optimal tree-decodable codes that are not optimal prefix codes.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Predicting Berth Stay for Tanker Terminals: A Systematic and Dynamic Approach
Authors:
Deqing Zhai,
Xiuju Fu,
Xiao Feng Yin,
Haiyan Xu,
Wanbing Zhang
Abstract:
Given the trend of digitization and increasing number of maritime transport, prediction of vessel berth stay has been triggered for requirements of operation research and scheduling optimization problem in the era of maritime big data, which takes a significant part in port efficiency and maritime logistics enhancement. This study proposes a systematic and dynamic approach of predicting berth stay…
▽ More
Given the trend of digitization and increasing number of maritime transport, prediction of vessel berth stay has been triggered for requirements of operation research and scheduling optimization problem in the era of maritime big data, which takes a significant part in port efficiency and maritime logistics enhancement. This study proposes a systematic and dynamic approach of predicting berth stay for tanker terminals. The approach covers three innovative aspects: 1) Data source employed is multi-faceted, including cargo operation data from tanker terminals, time-series data from automatic identification system (AIS), etc. 2) The process of berth stay is decomposed into multiple blocks according to data analysis and information extraction innovatively, and practical operation scenarios are also developed accordingly. 3) The predictive models of berth stay are developed on the basis of prior data analysis and information extraction under two methods, including regression and decomposed distribution. The models are evaluated under four dynamic scenarios with certain designated cargoes among two different terminals. The evaluation results show that the proposed approach can predict berth stay with the accuracy up to 98.81% validated by historical baselines, and also demonstrate the proposed approach has dynamic capability of predicting berth stay among the scenarios. The model may be potentially applied for short-term pilot-booking or scheduling optimizations within a reasonable time frame for advancement of port intelligence and logistics efficiency.
△ Less
Submitted 18 May, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 2
Authors:
Deqing Zhai,
Xiuju Fu,
Xiao Feng Yin,
Haiyan Xu,
Wanbing Zhang,
Ning Li
Abstract:
In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing weighted average turnaround time. The proposed approach is developed as a heuristic algorithm applied and investigated through different observation windows with weekly rolling horizon paradigm method. The experimental results show that the proposed approach is effective and prom…
▽ More
In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing weighted average turnaround time. The proposed approach is developed as a heuristic algorithm applied and investigated through different observation windows with weekly rolling horizon paradigm method. The experimental results show that the proposed approach is effective and promising on mitigating the turnaround time of vessels. The results demonstrate that largest potential savings of turnaround time (weighted average) are around 17 hours (28%) reduction on baseline of 1-week observation, 45 hours (37%) reduction on baseline of 2-week observation and 70 hours (40%) reduction on baseline of 3-week observation. Even though the experimental results are based on historical datasets, the results potentially present significant benefits if real-time applications were applied under a quadratic computational complexity.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 1
Authors:
Deqing Zhai,
Xiuju Fu,
Xiao Feng Yin,
Haiyan Xu,
Wanbing Zhang,
Ning Li
Abstract:
In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing average wait time and turnaround time. The proposed approach consists of enhanced particle swarm optimization (ePSO) as kernel and augmented firefly algorithm (AFA) as global optimal search. Two paradigm methods of the proposed approach are investigated, which are batch method an…
▽ More
In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing average wait time and turnaround time. The proposed approach consists of enhanced particle swarm optimization (ePSO) as kernel and augmented firefly algorithm (AFA) as global optimal search. Two paradigm methods of the proposed approach are investigated, which are batch method and rolling horizon method. The experimental results show that both paradigm methods of proposed approach can effectively enhance port efficiency. The average wait time could be significantly reduced by 86.0% - 95.5%, and the average turnaround time could eventually save 38.2% - 42.4% with respect to historical benchmarks. Moreover, the paradigm method of rolling horizon could reduce to 20 mins on running time over 3-month datasets, rather than 4 hrs on batch method at corresponding maximum performance.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Document Dewar** with Control Points
Authors:
Guo-Wang Xie,
Fei Yin,
Xu-Yao Zhang,
Cheng-Lin Liu
Abstract:
Document images are now widely captured by handheld devices such as mobile phones. The OCR performance on these images are largely affected due to geometric distortion of the document paper, diverse camera positions and complex backgrounds. In this paper, we propose a simple yet effective approach to rectify distorted document image by estimating control points and reference points. After that, we…
▽ More
Document images are now widely captured by handheld devices such as mobile phones. The OCR performance on these images are largely affected due to geometric distortion of the document paper, diverse camera positions and complex backgrounds. In this paper, we propose a simple yet effective approach to rectify distorted document image by estimating control points and reference points. After that, we use interpolation method between control points and reference points to convert sparse map**s to backward map**, and remap the original distorted document image to the rectified image. Furthermore, control points are controllable to facilitate interaction or subsequent adjustment. We can flexibly select post-processing methods and the number of vertices according to different application scenarios. Experiments show that our approach can rectify document images with various distortion types, and yield state-of-the-art performance on real-world dataset. This paper also provides a training dataset based on control points for document dewar**. Both the code and the dataset are released at https://github.com/gwxie/Document-Dewar**-with-Control-Points.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
A Deep Learning Model with Radiomics Analysis Integration for Glioblastoma Post-Resection Survival Prediction
Authors:
Zongsheng Hu,
Zhenyu Yang,
Haozhao Zhang,
Eugene Vaios,
Kyle Lafata,
Fang-Fang Yin,
Chunhao Wang
Abstract:
Purpose: To develop a novel deep-learning model that integrates radiomics analysis in a multi-dimensional feature fusion workflow for glioblastoma (GBM) post-resection survival prediction. Methods: A cohort of 235 GBM patients with complete surgical resection was divided into short-term/long-term survival groups with 1-yr survival time threshold. Each patient received a pre-surgery multi-parametri…
▽ More
Purpose: To develop a novel deep-learning model that integrates radiomics analysis in a multi-dimensional feature fusion workflow for glioblastoma (GBM) post-resection survival prediction. Methods: A cohort of 235 GBM patients with complete surgical resection was divided into short-term/long-term survival groups with 1-yr survival time threshold. Each patient received a pre-surgery multi-parametric MRI exam, and three tumor subregions were segmented by neuroradiologists. The developed model comprises three data source branches: in the 1st radiomics branch, 456 radiomics features (RF) were from each patient; in the 2nd deep learning branch, an encoding neural network architecture was trained for survival group prediction using each single MR modality, and high-dimensional parameters of the last two network layers were extracted as deep features (DF). The extracted radiomics features and deep features were processed by a feature selection procedure to reduce dimension size of each feature space. In the 3rd branch, non-image-based patient-specific clinical features (PSCF) were collected. Finally, data sources from all three branches were fused as an integrated input for a supporting vector machine (SVM) execution for survival group prediction. Different strategies of model design, including 1) 2D/3D-based image analysis, and 2) different data source combinations in SVM input design, were investigated in comparison studies. Results: The model achieved 0.638 prediction accuracy when using PSCF only, which was higher than the results using RF or DF only in both 2D and 3D analysis. The joint use of RF/PSCF improved accuracy results to 0.681 in 3D analysis. The most accurate models in 2D/3D analysis reached the highest accuracy 0.745 with different combinations of RF/DF/ PSCF, and the corresponding ROC AUC results were 0.69(2D) and 0.71(3D), respectively.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN
Authors:
Fei Yin,
Yong Zhang,
Xiaodong Cun,
Mingdeng Cao,
Yanbo Fan,
Xuan Wang,
Qingyan Bai,
Baoyuan Wu,
Jue Wang,
Yujiu Yang
Abstract:
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. One challenging quality factor is the resolution of the output video: higher resolution conveys more details. In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformatio…
▽ More
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. One challenging quality factor is the resolution of the output video: higher resolution conveys more details. In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Upon the observation, we explore the possibility of using a pre-trained StyleGAN to break through the resolution limit of training datasets. We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing. Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution. We design a video-based motion generation module and an audio-based one, which can be plugged into the framework either individually or jointly to drive the video generation. The predicted motion is used to transform the latent features of StyleGAN for visual animation. To compensate for the transformation distortion, we propose a calibration network as well as a domain loss to refine the features. Moreover, our framework allows two types of facial editing, i.e., global editing via GAN inversion and intuitive editing based on 3D morphable models. Comprehensive experiments show superior video quality, flexible controllability, and editability over state-of-the-art methods.
△ Less
Submitted 16 March, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
A Neural Ordinary Differential Equation Model for Visualizing Deep Neural Network Behaviors in Multi-Parametric MRI based Glioma Segmentation
Authors:
Zhenyu Yang,
Zongsheng Hu,
Hangjie Ji,
Kyle Lafata,
Scott Floyd,
Fang-Fang Yin,
Chunhao Wang
Abstract:
Purpose: To develop a neural ordinary differential equation (ODE) model for visualizing deep neural network (DNN) behavior during multi-parametric MRI (mp-MRI) based glioma segmentation as a method to enhance deep learning explainability. Methods: By hypothesizing that deep feature extraction can be modeled as a spatiotemporally continuous process, we designed a novel deep learning model, neural O…
▽ More
Purpose: To develop a neural ordinary differential equation (ODE) model for visualizing deep neural network (DNN) behavior during multi-parametric MRI (mp-MRI) based glioma segmentation as a method to enhance deep learning explainability. Methods: By hypothesizing that deep feature extraction can be modeled as a spatiotemporally continuous process, we designed a novel deep learning model, neural ODE, in which deep feature extraction was governed by an ODE without explicit expression. The dynamics of 1) MR images after interactions with DNN and 2) segmentation formation can be visualized after solving ODE. An accumulative contribution curve (ACC) was designed to quantitatively evaluate the utilization of each MRI by DNN towards the final segmentation results. The proposed neural ODE model was demonstrated using 369 glioma patients with a 4-modality mp-MRI protocol: T1, contrast-enhanced T1 (T1-Ce), T2, and FLAIR. Three neural ODE models were trained to segment enhancing tumor (ET), tumor core (TC), and whole tumor (WT). The key MR modalities with significant utilization by DNN were identified based on ACC analysis. Segmentation results by DNN using only the key MR modalities were compared to the ones using all 4 MR modalities. Results: All neural ODE models successfully illustrated image dynamics as expected. ACC analysis identified T1-Ce as the only key modality in ET and TC segmentations, while both FLAIR and T2 were key modalities in WT segmentation. Compared to the U-Net results using all 4 MR modalities, Dice coefficient of ET (0.784->0.775), TC (0.760->0.758), and WT (0.841->0.837) using the key modalities only had minimal differences without significance. Conclusion: The neural ODE model offers a new tool for optimizing the deep learning model inputs with enhanced explainability. The presented methodology can be generalized to other medical image-related deep learning applications.
△ Less
Submitted 23 March, 2022; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Preconditioned TBiCOR and TCORS Algorithms for Solving the Sylvester Tensor Equation
Authors:
Guang-Xin Huang,
Qi-Xing Chen,
Feng Yin
Abstract:
In this paper, the preconditioned TBiCOR and TCORS methods are presented for solving the Sylvester tensor equation. A tensor Lanczos $\mathcal{L}$-Biorthogonalization algorithm (TLB) is derived for solving the Sylvester tensor equation. Two improved TLB methods are presented. One is the biconjugate $\mathcal{L}$-orthogonal residual algorithm in tensor form (TBiCOR), which implements the $LU$ decom…
▽ More
In this paper, the preconditioned TBiCOR and TCORS methods are presented for solving the Sylvester tensor equation. A tensor Lanczos $\mathcal{L}$-Biorthogonalization algorithm (TLB) is derived for solving the Sylvester tensor equation. Two improved TLB methods are presented. One is the biconjugate $\mathcal{L}$-orthogonal residual algorithm in tensor form (TBiCOR), which implements the $LU$ decomposition for the triangular coefficient matrix derived by the TLB method. The other is the conjugate $\mathcal{L}$-orthogonal residual squared algorithm in tensor form (TCORS), which introduces a square operator to the residual of the TBiCOR algorithm. A preconditioner based on the nearest Kronecker product is used to accelerate the TBiCOR and TCORS algorithms, and we obtain the preconditioned TBiCOR algorithm (PTBiCOR) and preconditioned TCORS algorithm (PTCORS). The proposed algorithms are proved to be convergent within finite steps of iteration without roundoff errors. Several examples illustrate that the preconditioned TBiCOR and TCORS algorithms present excellent convergence.
△ Less
Submitted 27 November, 2021;
originally announced November 2021.
-
Highly Scalable Maximum Likelihood and Conjugate Bayesian Inference for ERGMs on Graph Sets with Equivalent Vertices
Authors:
Fan Yin,
Carter T. Butts
Abstract:
The exponential family random graph modeling (ERGM) framework provides a flexible approach for the statistical analysis of networks. As ERGMs typically involve normalizing factors that are costly to compute, practical inference relies on a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the MLE of ERGM p…
▽ More
The exponential family random graph modeling (ERGM) framework provides a flexible approach for the statistical analysis of networks. As ERGMs typically involve normalizing factors that are costly to compute, practical inference relies on a variety of approximations or other workarounds. Markov Chain Monte Carlo maximum likelihood (MCMC MLE) provides a powerful tool to approximate the MLE of ERGM parameters, and is feasible for typical models on single networks with as many as a few thousand nodes. MCMC-based algorithms for Bayesian analysis are more expensive, and high-quality answers are challenging to obtain on large graphs. For both strategies, extension to the pooled case - in which we observe multiple networks from a common generative process - adds further computational cost, with both time and memory scaling linearly in the number of graphs. This becomes prohibitive for large networks, or where large numbers of graph observations are available. Here, we exploit some basic properties of the discrete exponential families to develop an approach for ERGM inference in the pooled case that (where applicable) allows an arbitrarily large number of graph observations to be fit at no additional computational cost beyond preprocessing the data itself. Moreover, a variant of our approach can also be used to perform Bayesian inference under conjugate priors, again with no additional computational cost in the estimation phase. As we show, the conjugate prior is easily specified, and is well-suited to applications such as regularization. Simulation studies show that the pooled method leads to estimates with good frequentist properties, and posterior estimates under the conjugate prior are well-behaved. We demonstrate our approach with applications to pooled analysis of brain functional connectivity networks and to replicated x-ray crystal structures of hen egg-white lysozyme.
△ Less
Submitted 26 October, 2021;
originally announced October 2021.
-
Identity-guided Face Generation with Multi-modal Contour Conditions
Authors:
Qingyan Bai,
Weihao Xia,
Fei Yin,
Yujiu Yang
Abstract:
Recent face generation methods have tried to synthesize faces based on the given contour condition, like a low-resolution image or sketch. However, the problem of identity ambiguity remains unsolved, which usually occurs when the contour is too vague to provide reliable identity information (e.g., when its resolution is extremely low). Thus feasible solutions of image restoration could be infinite…
▽ More
Recent face generation methods have tried to synthesize faces based on the given contour condition, like a low-resolution image or sketch. However, the problem of identity ambiguity remains unsolved, which usually occurs when the contour is too vague to provide reliable identity information (e.g., when its resolution is extremely low). Thus feasible solutions of image restoration could be infinite. In this work, we propose a novel framework that takes the contour and an extra image specifying the identity as the inputs, where the contour can be of various modalities, including the low-resolution image, sketch, and semantic label map. Concretely, we propose a novel dual-encoder architecture, in which an identity encoder extracts the identity-related feature, accompanied by a main encoder to obtain the rough contour information and further fuse all the information together. The encoder output is iteratively fed into a pre-trained StyleGAN generator until getting a satisfying result. To the best of our knowledge, this is the first work that achieves identity-guided face generation conditioned on multi-modal contour images. Moreover, our method can produce photo-realistic results with 1024$\times$1024 resolution.
△ Less
Submitted 2 August, 2022; v1 submitted 10 October, 2021;
originally announced October 2021.
-
Prime-valent Symmetric graphs with a quasi-semiregular automorphism
Authors:
Fu-Gang Yin,
Yan-Quan Feng,
**-Xin Zhou,
A-Hui Jia
Abstract:
An automorphism of a graph is called quasi-semiregular if it fixes a unique vertex of the graph and its remaining cycles have the same length. This kind of symmetry of graphs was first investigated by Kutnar, Malnič, Martínez and Marušič in 2013, as a generalization of the well-known semiregular automorphism of a graph. Symmetric graphs of valency three or four, admitting a quasi-semiregular autom…
▽ More
An automorphism of a graph is called quasi-semiregular if it fixes a unique vertex of the graph and its remaining cycles have the same length. This kind of symmetry of graphs was first investigated by Kutnar, Malnič, Martínez and Marušič in 2013, as a generalization of the well-known semiregular automorphism of a graph. Symmetric graphs of valency three or four, admitting a quasi-semiregular automorphism, have been classified in recent two papers.
Let $p\geq 5$ be a prime and $Γ$ a connected symmetric graph of valency $p$ admitting a quasi-semiregular automorphism. In this paper, we first prove that either $Γ$ is a connected Cayley graph $\rm{Cay}(M,S)$ such that $M$ is a $2$-group admitting a fixed-point-free automorphism of order $p$ with $S$ as an orbit of involutions, or $Γ$ is a normal $N$-cover of a $T$-arc-transitive graph of valency $p$ admitting a quasi-semiregular automorphism, where $T$ is a non-abelian simple group and $N$ is a nilpotent group. Then in case $p=5$, we give a complete classification of such graphs $Γ$ such that either $\rm{Aut}(Γ)$ has a solvable arc-transitive subgroup or $Γ$ is $T$-arc-transitive with $T$ a non-abelian simple group. We also construct the first infinite family of symmetric graphs that have a quasi-semiregular automorphism and an insolvable full automorphism group.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
A Radiomics-Boosted Deep-Learning Model for COVID-19 and Non-COVID-19 Pneumonia Classification Using Chest X-ray Image
Authors:
Zongsheng Hu,
Zhenyu Yang,
Kyle J. Lafata,
Fang-Fang Yin,
Chunhao Wang
Abstract:
To develop a deep-learning model that integrates radiomics analysis for enhanced performance of COVID-19 and Non-COVID-19 pneumonia detection using chest X-ray image, two deep-learning models were trained based on a pre-trained VGG-16 architecture: in the 1st model, X-ray image was the sole input; in the 2nd model, X-ray image and 2 radiomic feature maps (RFM) selected by the saliency map analysis…
▽ More
To develop a deep-learning model that integrates radiomics analysis for enhanced performance of COVID-19 and Non-COVID-19 pneumonia detection using chest X-ray image, two deep-learning models were trained based on a pre-trained VGG-16 architecture: in the 1st model, X-ray image was the sole input; in the 2nd model, X-ray image and 2 radiomic feature maps (RFM) selected by the saliency map analysis of the 1st model were stacked as the input. Both models were developed using 812 chest X-ray images with 262/288/262 COVID-19/Non-COVID-19 pneumonia/healthy cases, and 649/163 cases were assigned as training-validation/independent test sets. In 1st model using X-ray as the sole input, the 1) sensitivity, 2) specificity, 3) accuracy, and 4) ROC Area-Under-the-Curve of COVID-19 vs Non-COVID-19 pneumonia detection were 1) 0.90$\pm$0.07 vs 0.78$\pm$0.09, 2) 0.94$\pm$0.04 vs 0.94$\pm$0.04, 3) 0.93$\pm$0.03 vs 0.89$\pm$0.03, and 4) 0.96$\pm$0.02 vs 0.92$\pm$0.04. In the 2nd model, two RFMs, Entropy and Short-Run-Emphasize, were selected with their highest cross-correlations with the saliency maps of the 1st model. The corresponding results demonstrated significant improvements (p<0.05) of COVID-19 vs Non-COVID-19 pneumonia detection: 1) 0.95$\pm$0.04 vs 0.85$\pm$0.04, 2) 0.97$\pm$0.02 vs 0.96$\pm$0.02, 3) 0.97$\pm$0.02 vs 0.93$\pm$0.02, and 4) 0.99$\pm$0.01 vs 0.97$\pm$0.02. The reduced variations suggested a superior robustness of 2nd model design.
△ Less
Submitted 23 October, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Quantification of lung function on CT images based on pulmonary radiomic filtering
Authors:
Zhenyu Yang,
Kyle J Lafata,
Xinru Chen,
James Bowsher,
Yushi Chang,
Chunhao Wang,
Fang-Fang Yin
Abstract:
Purpose: To develop a radiomics filtering technique for characterizing spatial-encoded regional pulmonary ventilation information on lung CT. Methods: The lung volume was segmented on 46 CT images, and a 3D sliding window kernel was implemented across the lung volume to capture the spatial-encoded image information. Fifty-three radiomic features were extracted within the kernel, resulting in a 4th…
▽ More
Purpose: To develop a radiomics filtering technique for characterizing spatial-encoded regional pulmonary ventilation information on lung CT. Methods: The lung volume was segmented on 46 CT images, and a 3D sliding window kernel was implemented across the lung volume to capture the spatial-encoded image information. Fifty-three radiomic features were extracted within the kernel, resulting in a 4th order tensor object. As such, each voxel coordinate of the original lung was represented as a 53-dimensional feature vector, such that radiomic features could be viewed as feature maps within the lungs. To test the technique as a potential pulmonary ventilation biomarker, the radiomic feature maps were compared to paired functional images (Galligas PET or DTPA-SPECT) based on Spearman correlation (r) analysis. Results: The radiomic feature map GLRLM-based Run-Length Non-Uniformity and GLCOM-based Sum Average are found to be highly correlated with the functional imaging. The achieved r (median [range]) for the two features are 0.46 [0.05, 0.67] and 0.45 [0.21, 0.65] across 46 patients and 2 functional imaging modalities, respectively. Conclusions: The results provide evidence that local regions of sparsely encoded heterogeneous lung parenchyma on CT are associated with diminished radiotracer uptake and measured lung ventilation defects on PET/SPECT imaging. These findings demonstrate the potential of radiomics to serve as a complementary tool to the current lung quantification techniques and provide hypothesis-generating data for future studies.
△ Less
Submitted 24 June, 2022; v1 submitted 24 May, 2021;
originally announced May 2021.
-
Post-Radiotherapy PET Image Outcome Prediction by Deep Learning under Biological Model Guidance: A Feasibility Study of Oropharyngeal Cancer Application
Authors:
Hangjie Ji,
Kyle Lafata,
Yvonne Mowery,
David Brizel,
Andrea L. Bertozzi,
Fang-Fang Yin,
Chunhao Wang
Abstract:
This paper develops a method of biologically guided deep learning for post-radiation FDG-PET image outcome prediction based on pre-radiation images and radiotherapy dose information. Based on the classic reaction-diffusion mechanism, a novel biological model was proposed using a partial differential equation that incorporates spatial radiation dose distribution as a patient-specific treatment info…
▽ More
This paper develops a method of biologically guided deep learning for post-radiation FDG-PET image outcome prediction based on pre-radiation images and radiotherapy dose information. Based on the classic reaction-diffusion mechanism, a novel biological model was proposed using a partial differential equation that incorporates spatial radiation dose distribution as a patient-specific treatment information variable. A 7-layer encoder-decoder-based convolutional neural network (CNN) was designed and trained to learn the proposed biological model. As such, the model could generate post-radiation FDG-PET image outcome predictions with possible time-series transition from pre-radiotherapy image states to post-radiotherapy states. The proposed method was developed using 64 oropharyngeal patients with paired FDG-PET studies before and after 20Gy delivery (2Gy/daily fraction) by IMRT. In a two-branch deep learning execution, the proposed CNN learns specific terms in the biological model from paired FDG-PET images and spatial dose distribution as in one branch, and the biological model generates post-20Gy FDG-PET image prediction in the other branch. The proposed method successfully generated post-20Gy FDG-PET image outcome prediction with breakdown illustrations of biological model components. Time-series FDG-PET image predictions were generated to demonstrate the feasibility of disease response rendering. The developed biologically guided deep learning method achieved post-20Gy FDG-PET image outcome predictions in good agreement with ground-truth results. With break-down biological modeling components, the outcome image predictions could be used in adaptive radiotherapy decision-making to optimize personalized plans for the best outcome in the future.
△ Less
Submitted 22 May, 2021;
originally announced May 2021.
-
BAR: Blockwise Adaptive Recoding for Batched Network Coding
Authors:
Hoover H. F. Yin,
Shenghao Yang,
Qiaoqiao Zhou,
Lily M. L. Yung,
Ka Hei Ng
Abstract:
Multi-hop networks become popular network topologies in various emerging Internet of things applications. Batched network coding (BNC) is a solution to reliable communications in such networks with packet loss. By grou** packets into small batches and restricting recoding to the packets belonging to the same batch, BNC has a much smaller computational and storage requirements at the intermediate…
▽ More
Multi-hop networks become popular network topologies in various emerging Internet of things applications. Batched network coding (BNC) is a solution to reliable communications in such networks with packet loss. By grou** packets into small batches and restricting recoding to the packets belonging to the same batch, BNC has a much smaller computational and storage requirements at the intermediate nodes compared with a direct application of random linear network coding. In this paper, we propose a practical recoding scheme called blockwise adaptive recoding (BAR) which learns the latest channel knowledge from short observations so that BAR can adapt to the fluctuation of channel conditions. We focus on investigating practical concerns such as the design of efficient BAR algorithms. We also design and investigate feedback schemes for BAR under imperfect feedback systems. Our numerical evaluations show that BAR has significant throughput gain for small batch size compared with the existing baseline recoding scheme. More importantly, this gain is insensitive to inaccurate channel knowledge. This encouraging result suggests that BAR is suitable to be realized in practice as the exact channel model and its parameters could be unknown and subject to change from time to time.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
A Unified Adaptive Recoding Framework for Batched Network Coding
Authors:
Hoover H. F. Yin,
Bin Tang,
Ka Hei Ng,
Shenghao Yang,
Xishi Wang,
Qiaoqiao Zhou
Abstract:
Batched network coding is a variation of random linear network coding which has low computational and storage costs. In order to adapt to random fluctuations in the number of erasures in individual batches, it is not optimal to recode and transmit the same number of packets for all batches. Different distributed optimization models, which are called adaptive recoding schemes, were formulated for t…
▽ More
Batched network coding is a variation of random linear network coding which has low computational and storage costs. In order to adapt to random fluctuations in the number of erasures in individual batches, it is not optimal to recode and transmit the same number of packets for all batches. Different distributed optimization models, which are called adaptive recoding schemes, were formulated for this purpose. The key component of these optimization problems is the expected value of the rank distribution of a batch at the next network node, which is also known as the expected rank. In this paper, we put forth a unified adaptive recoding framework with an arbitrary recoding field size. We show that the expected rank functions are concave when the packet loss pattern is a stationary stochastic process, which covers but not limited to independent packet loss and Gilbert-Elliott packet loss model. Under this concavity assumption, we show that there always exists a solution which not only can minimize the randomness on the number of recoded packets but also can tolerate rank distribution errors due to inaccurate measurements or limited precision of the machine. We provide an algorithm to obtain such an optimal optimal solution, and propose tuning schemes that can turn any feasible solution into a desired optimal solution.
△ Less
Submitted 15 September, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Intrablock Interleaving for Batched Network Coding with Blockwise Adaptive Recoding
Authors:
Hoover H. F. Yin,
Ka Hei Ng,
Allen Z. Zhong,
Raymond W. Yeung,
Shenghao Yang,
Ian Y. Y. Chan
Abstract:
Batched network coding (BNC) is a low-complexity solution to network transmission in multi-hop packet networks with packet loss. BNC encodes the source data into batches of packets. As a network coding scheme, the intermediate nodes perform recoding on the received packets belonging to the same batch instead of just forwarding them. A recoding scheme that may generate more recoded packets for batc…
▽ More
Batched network coding (BNC) is a low-complexity solution to network transmission in multi-hop packet networks with packet loss. BNC encodes the source data into batches of packets. As a network coding scheme, the intermediate nodes perform recoding on the received packets belonging to the same batch instead of just forwarding them. A recoding scheme that may generate more recoded packets for batches of a higher rank is also called adaptive recoding. Meanwhile, in order to combat burst packet loss, the transmission of a block of batches can be interleaved. Stream interleaving studied in literature achieves the maximum separation among any two consecutive packets of a batch, but permutes packets across blocks and hence cannot bound the buffer size and the latency. To resolve the issue of stream interleaver, we design an intrablock interleaver for adaptive recoding that can preserve the advantages of using a block interleaver when the number of recoded packets is the same for all batches. We use potential energy in classical mechanics to measure the performance of an interleaver, and propose an algorithm to optimize the interleaver with this performance measure. Our problem formulation and algorithm for intrablock interleaving are also of independent interest.
△ Less
Submitted 15 September, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
Utility Maximization for Multihop Wireless Networks Employing BATS Codes
Authors:
Yanyan Dong,
Sheng **,
Yanzuo Chen,
Shenghao Yang,
Hoover H. F. Yin
Abstract:
BATS (BATched Sparse) codes are a class of efficient random linear network coding variation that has been studied for multihop wireless networks mostly in scenarios of a single communication flow. Towards sophisticated multi-flow network communications, we formulate a network utility maximization (NUM) problem that jointly optimizes the BATS code parameters of all the flows and network scheduling.…
▽ More
BATS (BATched Sparse) codes are a class of efficient random linear network coding variation that has been studied for multihop wireless networks mostly in scenarios of a single communication flow. Towards sophisticated multi-flow network communications, we formulate a network utility maximization (NUM) problem that jointly optimizes the BATS code parameters of all the flows and network scheduling. The NUM problem adopts a batch-wise packet loss model that can be obtained from the network local statistics without any constraints on packet loss patterns. Moreover, the NUM problem allows a different number of recoded packets to be transmitted for different batches in a flow, which is called adaptive recoding. Due to both the probably nonconcave objective and the BATS code-related variables, the algorithms developed for the existing flow optimization problems cannot be applied directly to solve our NUM problem. We introduce a two-step algorithm to solve our NUM problem, where the first step solves the problem with nonadaptive recoding schemes, and the second step optimizes adaptive recoding hop-by-hop from upstream to downstream in each flow. We perform various numerical evaluations and simulations to verify the effectiveness and efficiency of the algorithm.
△ Less
Submitted 15 September, 2021; v1 submitted 17 May, 2021;
originally announced May 2021.
-
On Multi-Channel Huffman Codes for Asymmetric-Alphabet Channels
Authors:
Hoover H. F. Yin,
Xishi Wang,
Ka Hei Ng,
Russell W. F. Lai,
Lucien K. L. Ng,
Jack P. K. Ma
Abstract:
Zero-error single-channel source coding has been studied extensively over the past decades. Its natural multi-channel generalization is however not well investigated. While the special case with multiple symmetric-alphabet channels was studied a decade ago, codes in such setting have no advantage over single-channel codes in data compression, making them worthless in most applications. With essent…
▽ More
Zero-error single-channel source coding has been studied extensively over the past decades. Its natural multi-channel generalization is however not well investigated. While the special case with multiple symmetric-alphabet channels was studied a decade ago, codes in such setting have no advantage over single-channel codes in data compression, making them worthless in most applications. With essentially no development since the last decade, in this paper, we break the stalemate by showing that it is possible to beat single-channel source codes in terms of compression assuming asymmetric-alphabet channels. We present the multi-channel analog of several classical results in single-channel source coding, such as that a multi-channel Huffman code is an optimal tree-decodable code. We also show some evidences that finding an efficient construction of multi-channel Huffman codes may be hard. Nevertheless, we propose a suboptimal code construction whose redundancy is guaranteed to be no larger than that of an optimal single-channel source code.
△ Less
Submitted 8 May, 2021;
originally announced May 2021.
-
Small-Sample Inferred Adaptive Recoding for Batched Network Coding
Authors:
Jie Wang,
Zhiyuan Jia,
Hoover H. F. Yin,
Shenghao Yang
Abstract:
Batched network coding is a low-complexity network coding solution to feedbackless multi-hop wireless packet network transmission with packet loss. The data to be transmitted is encoded into batches where each of which consists of a few coded packets. Unlike the traditional forwarding strategy, the intermediate network nodes have to perform recoding, which generates recoded packets by network codi…
▽ More
Batched network coding is a low-complexity network coding solution to feedbackless multi-hop wireless packet network transmission with packet loss. The data to be transmitted is encoded into batches where each of which consists of a few coded packets. Unlike the traditional forwarding strategy, the intermediate network nodes have to perform recoding, which generates recoded packets by network coding operations restricted within the same batch. Adaptive recoding is a technique to adapt the fluctuation of packet loss by optimizing the number of recoded packets per batch to enhance the throughput. The input rank distribution, which is a piece of information regarding the batches arriving at the node, is required to apply adaptive recoding. However, this distribution is not known in advance in practice as the incoming link's channel condition may change from time to time. On the other hand, to fully utilize the potential of adaptive recoding, we need to have a good estimation of this distribution. In other words, we need to guess this distribution from a few samples so that we can apply adaptive recoding as soon as possible. In this paper, we propose a distributionally robust optimization for adaptive recoding with a small-sample inferred prediction of the input rank distribution. We develop an algorithm to efficiently solve this optimization with the support of theoretical guarantees that our optimization's performance would constitute as a confidence lower bound of the optimal throughput with high probability.
△ Less
Submitted 15 June, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
LLM helps design and optimize photonic crystal surface emitting lasers
Authors:
Renjie Li,
Ceyao Zhang,
Sixuan Mao,
Hai Huang,
Mou Zhong,
Yiou Cui,
Xiyuan Zhou,
Feng Yin,
Zhaoyu Zhang
Abstract:
Conventional design and optimization of Photonic Crystal Surface Emitting Lasers (PCSEL) usually requires expert knowledge in semiconductor physics and optimization algorithms, which is also known as the inverse design problem. However, with the trend towards automation and depersonalization of the entire integrated circuits (IC) industry, the conventional method, with the drawback of being relati…
▽ More
Conventional design and optimization of Photonic Crystal Surface Emitting Lasers (PCSEL) usually requires expert knowledge in semiconductor physics and optimization algorithms, which is also known as the inverse design problem. However, with the trend towards automation and depersonalization of the entire integrated circuits (IC) industry, the conventional method, with the drawback of being relatively labor-intensive and sub-optimal, warrants further refinement. This technical dilemma remained until the emergence of Large Language Models (LLMs), such as OpenAI's ChatGPT and Google's Bard. This paper explores the possibility of applying LLMs to machine learning-based design and optimization of PCSELs. Specifically, we utilize GPT3.5 and GPT4. By simply having conversations, GPT assisted us with writing Finite Difference Time Domain (FDTD) simulation code and deep reinforcement learning code to acquire the optimized PCSEL solution, spanning from the proposition of ideas to the realization of algorithms. Given that GPT will perform better when given detailed and specific questions, we break down the PCSEL design problem into a series of sub-problems and converse with GPT by posing open-ended heuristic questions rather than definitive commands. This paper shows that LLMs, such as ChatGPT, can guide the nanophotonic design and optimization processes, on both the conceptual and technical level, and we propose new human-AI co-design strategies and show their practical implications. We achieve a significant milestone for the first step towards an automated end to end nanophotonic design and production pipeline.
△ Less
Submitted 11 August, 2023; v1 submitted 25 April, 2021;
originally announced April 2021.
-
Determination of the up/down-quark mass within QCD sum rules in the scalar channel
Authors:
Fang-Hui Yin,
Wen-Ya Tian,
Liang Tang,
Zhi-Hui Guo
Abstract:
In this work, we determine up/down-quark mass $m_{q=u/d}$ in the isoscalar scalar channel from both the Shifman-Vainshtein-Zakharov (SVZ) and the Monte-Carlo-based QCD sum rules. The relevant spectral function, including the contributions from the $f_0(500)$, $f_0(980)$ and $f_0(1370)$ resonances, is determined from a sophisticated $U(3)$ chiral study. Via the traditional SVZ QCD sum rules, we giv…
▽ More
In this work, we determine up/down-quark mass $m_{q=u/d}$ in the isoscalar scalar channel from both the Shifman-Vainshtein-Zakharov (SVZ) and the Monte-Carlo-based QCD sum rules. The relevant spectral function, including the contributions from the $f_0(500)$, $f_0(980)$ and $f_0(1370)$ resonances, is determined from a sophisticated $U(3)$ chiral study. Via the traditional SVZ QCD sum rules, we give the prediction to the average light-quark mass $m_q(2 \, \text{GeV})=\frac{1}{2}(m_u(2 \, \text{GeV}) + m_d(2 \, \text{GeV}))=(3.46^{+0.16}_{-0.22} \pm 0.33) \, \text{MeV}$. Meanwhile, by considering the uncertainties of the input QCD parameters and the spectral functions of the isoscalar scalar channel, we obtain $m_q (2\, \text{GeV}) = (3.44 \pm 0.14 \pm 0.32) \, \text{MeV}$ from the Monte-Carlo-based QCD sum rules. Both results are perfectly consistent with each other, and nicely agree with the Particle Data Group value within the uncertainties.
△ Less
Submitted 17 September, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
On the Sensitivity and Stability of Model Interpretations in NLP
Authors:
Fan Yin,
Zhouxing Shi,
Cho-Jui Hsieh,
Kai-Wei Chang
Abstract:
Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. W…
▽ More
Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective under the new criteria and overcome limitations of gradient-based methods on removal-based criteria. Besides text classification, we also apply interpretation methods and metrics to dependency parsing. Our results shed light on understanding the diverse set of interpretations.
△ Less
Submitted 31 March, 2022; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Dewar** Document Image By Displacement Flow Estimation with Fully Convolutional Network
Authors:
Guo-Wang Xie,
Fei Yin,
Xu-Yao Zhang,
Cheng-Lin Liu
Abstract:
As camera-based documents are increasingly used, the rectification of distorted document images becomes a need to improve the recognition performance. In this paper, we propose a novel framework for both rectifying distorted document image and removing background finely, by estimating pixel-wise displacements using a fully convolutional network (FCN). The document image is rectified by transformat…
▽ More
As camera-based documents are increasingly used, the rectification of distorted document images becomes a need to improve the recognition performance. In this paper, we propose a novel framework for both rectifying distorted document image and removing background finely, by estimating pixel-wise displacements using a fully convolutional network (FCN). The document image is rectified by transformation according to the displacements of pixels. The FCN is trained by regressing displacements of synthesized distorted documents, and to control the smoothness of displacements, we propose a Local Smooth Constraint (LSC) in regularization. Our approach is easy to implement and consumes moderate computing resource. Experiments proved that our approach can dewarp document images effectively under various geometric distortions, and has achieved the state-of-the-art performance in terms of local details and overall effect.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
2-Arc-transitive Cayley graphs on alternating groups
Authors:
Jiangmin Pan,
Binzhou Xia,
Fugang Yin
Abstract:
An interesting fact is that most of the known connected $2$-arc-transitive nonnormal Cayley graphs of small valency on finite simple groups are $(\mathrm{A}_{n+1},2)$-arc-transitive Cayley graphs on $\mathrm{A}_n$. This motivates the study of $2$-arc-transitive Cayley graphs on $\mathrm{A}_n$ for arbitrary valency. In this paper, we characterize the automorphism groups of such graphs. In particula…
▽ More
An interesting fact is that most of the known connected $2$-arc-transitive nonnormal Cayley graphs of small valency on finite simple groups are $(\mathrm{A}_{n+1},2)$-arc-transitive Cayley graphs on $\mathrm{A}_n$. This motivates the study of $2$-arc-transitive Cayley graphs on $\mathrm{A}_n$ for arbitrary valency. In this paper, we characterize the automorphism groups of such graphs. In particular, we show that for a non-complete $(G,2)$-arc-transitive Cayley graph on $\mathrm{A}_n$ with $G$ almost simple, the socle of $G$ is either $\mathrm{A}_{n+1}$ or $\mathrm{A}_{n+2}$. We also construct the first infinite family of $(\mathrm{A}_{n+2},2)$-arc-transitive Cayley graphs on $\mathrm{A}_n$.
△ Less
Submitted 26 March, 2021;
originally announced March 2021.
-
Manipulate the Electronic State of Mott Iridate Superlattice through Protonation Induced Electron-Filling
Authors:
Meng Wang,
Lin Hao,
Fang Yin,
Xin Yang,
Shengchun Shen,
Nianlong Zou,
Hui Cao,
Junyi Yang,
Nianpeng Lu,
Yongshun Wu,
Jianbing Zhang,
Hua Zhou,
Jia Li,
Jian Liu,
Pu Yu
Abstract:
Spin-orbit-coupled Mott iridates show great similarity with parent compounds of superconducting cuprates, attracting extensive research interests especially for their electron-doped states. However, previous experiments are largely limited within a small do** range due to the absence of effective dopants, and therefore the electron-doped phase diagram remains elusive. Here we utilize an ionic-li…
▽ More
Spin-orbit-coupled Mott iridates show great similarity with parent compounds of superconducting cuprates, attracting extensive research interests especially for their electron-doped states. However, previous experiments are largely limited within a small do** range due to the absence of effective dopants, and therefore the electron-doped phase diagram remains elusive. Here we utilize an ionic-liquid-gating induced protonation method to achieve electron-do** into a 5d Mott-insulator built with SrIrO3/SrTiO3 superlattice, and achieve a systematic map** of its electron-doped phase diagram with the evolution of the iridium valence state from 4+ to 3+, equivalent to do** of one electron per iridium ion. Along increasing do** level, the parent Mott-insulator is first turned into a localized metallic state with gradually suppressed magnetic ordering, and then further evolved into a nonmagnetic band insulating state. This work forms an important step forward for the study of electron-doped Mott iridate systems, and the strategy of manipulating the band filling in an artificially designed superlattice structure can be readily extended into other systems with more exotic states to explore.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.