-
Resistance Distances in Directed Graphs: Definitions, Properties, and Applications
Authors:
Mingzhe Zhu,
Liwang Zhu,
Huan Li,
Wei Li,
Zhongzhi Zhang
Abstract:
Resistance distance has been studied extensively in the past years, with the majority of previous studies devoted to undirected networks, in spite of the fact that various realistic networks are directed. Although several generalizations of resistance distance on directed graphs have been proposed, they either have no physical interpretation or are not a metric. In this paper, we first extend the…
▽ More
Resistance distance has been studied extensively in the past years, with the majority of previous studies devoted to undirected networks, in spite of the fact that various realistic networks are directed. Although several generalizations of resistance distance on directed graphs have been proposed, they either have no physical interpretation or are not a metric. In this paper, we first extend the definition of resistance distance to strongly connected directed graphs based on random walks and show that the two-node resistance distance on directed graphs is a metric. Then, we introduce the Laplacian matrix for directed graphs that subsumes the Laplacian matrix of undirected graphs as a particular case and use its pseudoinverse to express the two-node resistance distance, and many other relevant quantities derived from resistance distances. Moreover, we define the resistance distance between a vertex and a vertex group on directed graphs and further define a problem of optimally selecting a group of fixed number of nodes, such that their resistance distance is minimized. Since this combinatorial optimization problem is NP-hard, we present a greedy algorithm with a proved approximation ratio, and conduct experiments on model and realistic networks to validate the performance of this approximation algorithm.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Deep Reinforcement Learning for Traffic Light Control in Intelligent Transportation Systems
Authors:
Xiao-Yang Liu,
Ming Zhu,
Sem Borst,
Anwar Walid
Abstract:
Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep rei…
▽ More
Smart traffic lights in intelligent transportation systems (ITSs) are envisioned to greatly increase traffic efficiency and reduce congestion. Deep reinforcement learning (DRL) is a promising approach to adaptively control traffic lights based on the real-time traffic situation in a road network. However, conventional methods may suffer from poor scalability. In this paper, we investigate deep reinforcement learning to control traffic lights, and both theoretical analysis and numerical experiments show that the intelligent behavior ``greenwave" (i.e., a vehicle will see a progressive cascade of green lights, and not have to brake at any intersection) emerges naturally a grid road network, which is proved to be the optimal policy in an avenue with multiple cross streets. As a first step, we use two DRL algorithms for the traffic light control problems in two scenarios. In a single road intersection, we verify that the deep Q-network (DQN) algorithm delivers a thresholding policy; and in a grid road network, we adopt the deep deterministic policy gradient (DDPG) algorithm. Secondly, numerical experiments show that the DQN algorithm delivers the optimal control, and the DDPG algorithm with passive observations has the capability to produce on its own a high-level intelligent behavior in a grid road network, namely, the ``greenwave" policy emerges. We also verify the ``greenwave" patterns in a $5 \times 10$ grid road network. Thirdly, the ``greenwave" patterns demonstrate that DRL algorithms produce favorable solutions since the ``greenwave" policy shown in experiment results is proved to be optimal in a specified traffic model (an avenue with multiple cross streets). The delivered policies both in a single road intersection and a grid road network demonstrate the scalability of DRL algorithms.
△ Less
Submitted 5 March, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
High sensitivity HI image of diffuse gas and new tidal features in M51 observed by FAST
Authors:
Haiyang Yu,
Ming Zhu,
**-Long Xu,
Mei Ai,
Peng Jiang,
Yanbin Yang
Abstract:
We observed the classical interacting galaxy M51 with FAST and obtain high sensitivity HI image with column density down to 3.8 $\times$ 10$^{18}$ cm$^{-2}$. In the image we can see a diffuse extended envelope around the system and several new tidal features. We also get a deeper look at M51b's probable gas, which has an approximated velocity range of 560 to 740 km s$^{-1}$ and a flux of 7.5 Jy km…
▽ More
We observed the classical interacting galaxy M51 with FAST and obtain high sensitivity HI image with column density down to 3.8 $\times$ 10$^{18}$ cm$^{-2}$. In the image we can see a diffuse extended envelope around the system and several new tidal features. We also get a deeper look at M51b's probable gas, which has an approximated velocity range of 560 to 740 km s$^{-1}$ and a flux of 7.5 Jy km s$^{-1}$. Compared to the VLA image, we observe more complete structures of the Southeast Tail, Northeast Cloud and Northwest Plume, as well as new features of the Northwest Cloud and Southwest Plume. M51's most prominent tidal feature, the Southeast Tail, looks very long and broad, in addition with two small detached clouds at the periphery. Due to the presence of optical and simulated counterparts, the Northwest cloud appears to be the tail of M51a, while the Northwest Plume is more likely a tidal tail of M51b. The large mass of the Northwest Plume suggests that M51b may have been as gas-rich as M51a before the interaction. In addition, the formation process of the Northeast Cloud and Southwest Plume is obscured by the lack of optical and simulated counterparts. These novel tidal features, together with M51b's probable gas, will inspire future simulations and provide a deeper understanding of the evolution of this interacting system.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Discovery of an isolated dark dwarf galaxy in the nearby universe
Authors:
**-Long Xu,
Ming Zhu,
Nai** Yu,
Chuan-Peng Zhang,
Xiao-Lan Liu,
Mei Ai,
Peng Jiang
Abstract:
Based on a new HI survey using the Five-hundred-meter Aperture Spherical radio Telescope (FAST), combined with the Pan-STARRS1 images, we identified an isolated HI cloud without any optical counterpart, named FAST J0139+4328. The newly discovered HI cloud appears to be a typical disk galaxy since it has a double-peak shape in the global HI profile and an S-like rotation structure in the velocity-p…
▽ More
Based on a new HI survey using the Five-hundred-meter Aperture Spherical radio Telescope (FAST), combined with the Pan-STARRS1 images, we identified an isolated HI cloud without any optical counterpart, named FAST J0139+4328. The newly discovered HI cloud appears to be a typical disk galaxy since it has a double-peak shape in the global HI profile and an S-like rotation structure in the velocity-position diagram. Moreover, this disk galaxy has an extremely low absolute magnitude (M_B>-10.0 mag) and stellar mass (<6.9*10^5 Msun). Furthermore, we obtained that the HI mass of this galaxy is 8.3*10^7 Msun, and the dynamical mass to total baryonic mass ratio is 47+-27, implying that dark matter dominates over baryons in FAST J0139+4328. These findings provide observational evidence that FAST J0139+4328 is an isolated dark dwarf galaxy with a redshift of z=0.0083. This is the first time that an isolated dark galaxy has been detected in the nearby universe.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
Efficient Gradient Approximation Method for Constrained Bilevel Optimization
Authors:
Siyuan Xu,
Minghui Zhu
Abstract:
Bilevel optimization has been developed for many machine learning tasks with large-scale and high-dimensional data. This paper considers a constrained bilevel optimization problem, where the lower-level optimization problem is convex with equality and inequality constraints and the upper-level optimization problem is non-convex. The overall objective function is non-convex and non-differentiable.…
▽ More
Bilevel optimization has been developed for many machine learning tasks with large-scale and high-dimensional data. This paper considers a constrained bilevel optimization problem, where the lower-level optimization problem is convex with equality and inequality constraints and the upper-level optimization problem is non-convex. The overall objective function is non-convex and non-differentiable. To solve the problem, we develop a gradient-based approach, called gradient approximation method, which determines the descent direction by computing several representative gradients of the objective function inside a neighborhood of the current estimate. We show that the algorithm asymptotically converges to the set of Clarke stationary points, and demonstrate the efficacy of the algorithm by the experiments on hyperparameter optimization and meta-learning.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
CheckedCBox: Type Directed Program Partitioning with Checked C for Incremental Spatial Memory Safety
Authors:
Liyi Li,
Arunkumar Bhattar,
Le Chang,
Mingwei Zhu,
Aravind Machiry
Abstract:
Spatial memory safety violation is still a major issue for C programs. Checked-C is a safe dialect of C and extends it with Checked pointer types and annotations that guarantee spatial memory safety in a backward-compatible manner, allowing the mix of checked pointers and regular (unchecked) pointer types. However, unchecked code vulnerabilities can violate the checked code's spatial safety guaran…
▽ More
Spatial memory safety violation is still a major issue for C programs. Checked-C is a safe dialect of C and extends it with Checked pointer types and annotations that guarantee spatial memory safety in a backward-compatible manner, allowing the mix of checked pointers and regular (unchecked) pointer types. However, unchecked code vulnerabilities can violate the checked code's spatial safety guarantees. We present CheckedCBox, which adds a flexible, type-directed program partitioning mechanism to Checked-C, by enhancing the Checked-C type system with tainted types that enable flexible partitioning of the program into checked and unchecked regions, in a manner such that unchecked region code does not affect the spatial safety in the checked region. We formalize our type system and prove the non-crashing and non-exposure properties of a well-typed CheckedCBox program. We implemented CheckedCBox in a configurable manner, which enables us to use existing sandbox mechanisms (eg WebAssembly) to execute programs. Consequently, in doing so, CheckedCBox has prevented four known vulnerabilities by efficiently partitioning the program.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Cluster-CAM: Cluster-Weighted Visual Interpretation of CNNs' Decision in Image Classification
Authors:
Zhenpeng Feng,
Hongbing Ji,
Milos Dakovic,
Xiyang Cui,
Mingzhe Zhu,
Ljubisa Stankovic
Abstract:
Despite the tremendous success of convolutional neural networks (CNNs) in computer vision, the mechanism of CNNs still lacks clear interpretation. Currently, class activation map** (CAM), a famous visualization technique to interpret CNN's decision, has drawn increasing attention. Gradient-based CAMs are efficient while the performance is heavily affected by gradient vanishing and exploding. In…
▽ More
Despite the tremendous success of convolutional neural networks (CNNs) in computer vision, the mechanism of CNNs still lacks clear interpretation. Currently, class activation map** (CAM), a famous visualization technique to interpret CNN's decision, has drawn increasing attention. Gradient-based CAMs are efficient while the performance is heavily affected by gradient vanishing and exploding. In contrast, gradient-free CAMs can avoid computing gradients to produce more understandable results. However, existing gradient-free CAMs are quite time-consuming because hundreds of forward interference per image are required. In this paper, we proposed Cluster-CAM, an effective and efficient gradient-free CNN interpretation algorithm. Cluster-CAM can significantly reduce the times of forward propagation by splitting the feature maps into clusters in an unsupervised manner. Furthermore, we propose an artful strategy to forge a cognition-base map and cognition-scissors from clustered feature maps. The final salience heatmap will be computed by merging the above cognition maps. Qualitative results conspicuously show that Cluster-CAM can produce heatmaps where the highlighted regions match the human's cognition more precisely than existing CAMs. The quantitative evaluation further demonstrates the superiority of Cluster-CAM in both effectiveness and efficiency.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Parity-violation in bouncing cosmology
Authors:
Mian Zhu,
Yong Cai
Abstract:
We investigate the possibility of the enhancement of parity-violation signal in bouncing cosmology. Specifically, we are interested in deciding which phase should generate the most significant parity-violation signals. We find that the dominant contribution comes from the bouncing phase, while the contraction phase has a smaller contribution. Therefore, bouncing cosmology can enhance the parity-vi…
▽ More
We investigate the possibility of the enhancement of parity-violation signal in bouncing cosmology. Specifically, we are interested in deciding which phase should generate the most significant parity-violation signals. We find that the dominant contribution comes from the bouncing phase, while the contraction phase has a smaller contribution. Therefore, bouncing cosmology can enhance the parity-violation signals during the bouncing phase. Moreover, since the bouncing phase has the highest energy scale in bouncing cosmology, we can also probe new physics at this scale by studying the parity-violation effect.
△ Less
Submitted 21 April, 2023; v1 submitted 31 January, 2023;
originally announced January 2023.
-
KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation
Authors:
Zhanglin Wu,
Min Zhang,
Ming Zhu,
Yinglu Li,
Ting Zhu,
Hao Yang,
Song Peng,
Ying Qin
Abstract:
BERTScore is an effective and robust automatic metric for referencebased machine translation evaluation. In this paper, we incorporate multilingual knowledge graph into BERTScore and propose a metric named KG-BERTScore, which linearly combines the results of BERTScore and bilingual named entity matching for reference-free machine translation evaluation. From the experimental results on WMT19 QE as…
▽ More
BERTScore is an effective and robust automatic metric for referencebased machine translation evaluation. In this paper, we incorporate multilingual knowledge graph into BERTScore and propose a metric named KG-BERTScore, which linearly combines the results of BERTScore and bilingual named entity matching for reference-free machine translation evaluation. From the experimental results on WMT19 QE as a metric without references shared tasks, our metric KG-BERTScore gets higher overall correlation with human judgements than the current state-of-the-art metrics for reference-free machine translation evaluation.1 Moreover, the pre-trained multilingual model used by KG-BERTScore and the parameter for linear combination are also studied in this paper.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Few-shot Face Image Translation via GAN Prior Distillation
Authors:
Ruoyu Zhao,
Mingrui Zhu,
Xiaoyu Wang,
Nannan Wang
Abstract:
Face image translation has made notable progress in recent years. However, when training on limited data, the performance of existing approaches significantly declines. Although some studies have attempted to tackle this problem, they either failed to achieve the few-shot setting (less than 10) or can only get suboptimal results. In this paper, we propose GAN Prior Distillation (GPD) to enable eff…
▽ More
Face image translation has made notable progress in recent years. However, when training on limited data, the performance of existing approaches significantly declines. Although some studies have attempted to tackle this problem, they either failed to achieve the few-shot setting (less than 10) or can only get suboptimal results. In this paper, we propose GAN Prior Distillation (GPD) to enable effective few-shot face image translation. GPD contains two models: a teacher network with GAN Prior and a student network that fulfills end-to-end translation. Specifically, we adapt the teacher network trained on large-scale data in the source domain to the target domain with only a few samples, where it can learn the target domain's knowledge. Then, we can achieve few-shot augmentation by generating source domain and target domain images simultaneously with the same latent codes. We propose an anchor-based knowledge distillation module that can fully use the difference between the training and the augmented data to distill the knowledge of the teacher network into the student network. The trained student network achieves excellent generalization performance with the absorption of additional knowledge. Qualitative and quantitative experiments demonstrate that our method achieves superior results than state-of-the-art approaches in a few-shot setting.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Few-shot Font Generation by Learning Style Difference and Similarity
Authors:
Xiao He,
Mingrui Zhu,
Nannan Wang,
Xinbo Gao,
Heng Yang
Abstract:
Few-shot font generation (FFG) aims to preserve the underlying global structure of the original character while generating target fonts by referring to a few samples. It has been applied to font library creation, a personalized signature, and other scenarios. Existing FFG methods explicitly disentangle content and style of reference glyphs universally or component-wisely. However, they ignore the…
▽ More
Few-shot font generation (FFG) aims to preserve the underlying global structure of the original character while generating target fonts by referring to a few samples. It has been applied to font library creation, a personalized signature, and other scenarios. Existing FFG methods explicitly disentangle content and style of reference glyphs universally or component-wisely. However, they ignore the difference between glyphs in different styles and the similarity of glyphs in the same style, which results in artifacts such as local distortions and style inconsistency. To address this issue, we propose a novel font generation approach by learning the Difference between different styles and the Similarity of the same style (DS-Font). We introduce contrastive learning to consider the positive and negative relationship between styles. Specifically, we propose a multi-layer style projector for style encoding and realize a distinctive style representation via our proposed Cluster-level Contrastive Style (CCS) loss. In addition, we design a multi-task patch discriminator, which comprehensively considers different areas of the image and ensures that each style can be distinguished independently. We conduct qualitative and quantitative evaluations comprehensively to demonstrate that our approach achieves significantly better results than state-of-the-art methods.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
On explicit birational geometry for weak Fano varieties and polarised Calabi-Yau varieties
Authors:
Minzhe Zhu
Abstract:
Given a natural number $l$ and a weak Fano $n$-fold $X$ with $\operatorname{dim}\overline{\varphi_{-lK_X}(X)}\geq n-1$, we study the lower bound of the anti-canonical volume and the upper bound of the anti-canonical stability index. The method can also be used to give similar bounds for polarised Calabi-Yau varieties.
Given a natural number $l$ and a weak Fano $n$-fold $X$ with $\operatorname{dim}\overline{\varphi_{-lK_X}(X)}\geq n-1$, we study the lower bound of the anti-canonical volume and the upper bound of the anti-canonical stability index. The method can also be used to give similar bounds for polarised Calabi-Yau varieties.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
On Existence Theorems for Conditional Inferential Models
Authors:
Rongrong Zhang,
Michael Y. Zhu,
Chuanhai Liu
Abstract:
The framework of Inferential Models (IMs) has recently been developed in search of what is referred to as the holy grail of statistical theory, that is, prior-free probabilistic inference. Its method of Conditional IMs (CIMs) is a critical component in that it serves as a desirable extension of the Bayes theorem for combining information when no prior distribution is available. The general form of…
▽ More
The framework of Inferential Models (IMs) has recently been developed in search of what is referred to as the holy grail of statistical theory, that is, prior-free probabilistic inference. Its method of Conditional IMs (CIMs) is a critical component in that it serves as a desirable extension of the Bayes theorem for combining information when no prior distribution is available. The general form of CIMs is defined by a system of first-order homogeneous linear partial differential equations (PDEs). When admitting simple solutions, they are referred to as regular, whereas when no regular CIMs exist, they are used as the so-called local CIMs. This paper provides conditions for regular CIMs, which are shown to be equivalent to the existence of a group-theoretical representation of the underlying statistical model. It also establishes existence theorems for CIMs, which state that under mild conditions, local CIMs always exist. Finally, the paper concludes with a simple example and a few remarks on future developments of CIMs for applications to popular but inferentially nontrivial statistical models.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
MonoEdge: Monocular 3D Object Detection Using Local Perspectives
Authors:
Minghan Zhu,
Lingting Ge,
Panqu Wang,
Huei Peng
Abstract:
We propose a novel approach for monocular 3D object detection by leveraging local perspective effects of each object. While the global perspective effect shown as size and position variations has been exploited for monocular 3D detection extensively, the local perspectives has long been overlooked. We design a local perspective module to regress a newly defined variable named keyedge-ratios as the…
▽ More
We propose a novel approach for monocular 3D object detection by leveraging local perspective effects of each object. While the global perspective effect shown as size and position variations has been exploited for monocular 3D detection extensively, the local perspectives has long been overlooked. We design a local perspective module to regress a newly defined variable named keyedge-ratios as the parameterization of the local shape distortion to account for the local perspective, and derive the object depth and yaw angle from it. Theoretically, this module does not rely on the pixel-wise size or position in the image of the objects, therefore independent of the camera intrinsic parameters. By plugging this module in existing monocular 3D object detection frameworks, we incorporate the local perspective distortion with global perspective effect for monocular 3D reasoning, and we demonstrate the effectiveness and superior performance over strong baseline methods in multiple datasets.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
A deep local attention network for pre-operative lymph node metastasis prediction in pancreatic cancer via multiphase CT imaging
Authors:
Zhilin Zheng,
Xu Fang,
Jiawen Yao,
Mengmeng Zhu,
Le Lu,
Lingyun Huang,
**g Xiao,
Yu Shi,
Hong Lu,
Jian** Lu,
Ling Zhang,
Chengwei Shao,
Yun Bian
Abstract:
Lymph node (LN) metastasis status is one of the most critical prognostic and cancer staging factors for patients with resectable pancreatic ductal adenocarcinoma (PDAC), or in general, for any types of solid malignant tumors. Preoperative prediction of LN metastasis from non-invasive CT imaging is highly desired, as it might be straightforwardly used to guide the following neoadjuvant treatment de…
▽ More
Lymph node (LN) metastasis status is one of the most critical prognostic and cancer staging factors for patients with resectable pancreatic ductal adenocarcinoma (PDAC), or in general, for any types of solid malignant tumors. Preoperative prediction of LN metastasis from non-invasive CT imaging is highly desired, as it might be straightforwardly used to guide the following neoadjuvant treatment decision and surgical planning. Most studies only capture the tumor characteristics in CT imaging to implicitly infer LN metastasis and very few work exploit direct LN's CT imaging information. To the best of our knowledge, this is the first work to propose a fully-automated LN segmentation and identification network to directly facilitate the LN metastasis status prediction task. Nevertheless LN segmentation/detection is very challenging since LN can be easily confused with other hard negative anatomic structures (e.g., vessels) from radiological images. We explore the anatomical spatial context priors of pancreatic LN locations by generating a guiding attention map from related organs and vessels to assist segmentation and infer LN status. As such, LN segmentation is impelled to focus on regions that are anatomically adjacent or plausible with respect to the specific organs and vessels. The metastasized LN identification network is trained to classify the segmented LN instances into positives or negatives by reusing the segmentation network as a pre-trained backbone and padding a new classification head. More importantly, we develop a LN metastasis status prediction network that combines the patient-wise aggregation results of LN segmentation/identification and deep imaging features extracted from the tumor region. Extensive quantitative nested five-fold cross-validation is conducted on a discovery dataset of 749 patients with PDAC.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
High-Quality Real-Time Rendering Using Subpixel Sampling Reconstruction
Authors:
Boyu Zhang,
Hongliang Yuan,
Mingyan Zhu,
Ligang Liu,
Jue Wang
Abstract:
Generating high-quality, realistic rendering images for real-time applications generally requires tracing a few samples-per-pixel (spp) and using deep learning-based approaches to denoise the resulting low-spp images. Existing denoising methods have yet to achieve real-time performance at high resolutions due to the physically-based sampling and network inference time costs. In this paper, we prop…
▽ More
Generating high-quality, realistic rendering images for real-time applications generally requires tracing a few samples-per-pixel (spp) and using deep learning-based approaches to denoise the resulting low-spp images. Existing denoising methods have yet to achieve real-time performance at high resolutions due to the physically-based sampling and network inference time costs. In this paper, we propose a novel Monte Carlo sampling strategy to accelerate the sampling process and a corresponding denoiser, subpixel sampling reconstruction (SSR), to obtain high-quality images. Extensive experiments demonstrate that our method significantly outperforms previous approaches in denoising quality and reduces overall time costs, enabling real-time rendering capabilities at 2K resolution.
△ Less
Submitted 25 June, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
FEASTS: IGM cooling triggered by tidal interactions through the diffuse HI phase around NGC 4631
Authors:
**g Wang,
Dong Yang,
Se-Heon Oh,
Lister Staveley-Smith,
Jie Wang,
Q. Daniel Wang,
Kelley M. Hess,
Luis C. Ho,
Ligang Hou,
Yingjie **g,
Peter Kamphuis,
Fujia Li,
Xuchen Lin,
Ziming Liu,
Li Shao,
Shun Wang,
Ming Zhu
Abstract:
We use the single-dish radio telescope FAST to map the HI in the tidally interacting NGC 4631 group with a resolution of 3.24$'$ (7 kpc), reaching a 5-$σ$ column density limit of $10^{17.9}$ cm$^{-2}$ assuming a line width of 20 km s$^{-1}$. Taking the existing interferometric HI image from the HALOGAS project of WSRT as reference, we are able to identify and characterize a significant excess of l…
▽ More
We use the single-dish radio telescope FAST to map the HI in the tidally interacting NGC 4631 group with a resolution of 3.24$'$ (7 kpc), reaching a 5-$σ$ column density limit of $10^{17.9}$ cm$^{-2}$ assuming a line width of 20 km s$^{-1}$. Taking the existing interferometric HI image from the HALOGAS project of WSRT as reference, we are able to identify and characterize a significant excess of large-scale, low-density, and diffuse HI in the group. This diffuse HI extends for more than 120 kpc across, and accounts for more than one fourth of the total HI detected by FAST in and around the galaxy NGC 4631. In the region of the tidal tails, the diffuse HI has a typical column density above $10^{19.5}$ cm$^{-2}$, and is highly turbulent with a velocity dispersion around 50 km s$^{-1}$. It increases in column density with the dense HI, and tends to be associated with the kinematically ``hotter'' part of the dense HI. Through simple modeling, we find that the majority of the diffuse HI in the tail region is likely to induce cooling out of the hot IGM instead of evaporating or being radiatively ionized. Given these relations of gas in different phases, the diffuse HI may represent a condensing phase of the IGM. Active tidal interactions on-going and in the past may have produced the wide-spreading HI distribution, and triggered the gas accretion to NGC 4631 through the phase of the diffuse HI.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
Large Language Models are Better Reasoners with Self-Verification
Authors:
Yixuan Weng,
Minjun Zhu,
Fei Xia,
Bin Li,
Shizhu He,
Sheng** Liu,
Bin Sun,
Kang Liu,
Jun Zhao
Abstract:
Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation.…
▽ More
Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation. The above issues make the LLMs need the ability to verify the answers. In fact, after inferring conclusions in some thinking decision tasks, people often check them by re-verifying steps to avoid some mistakes. In this paper, we propose and prove that LLMs also have similar self-verification abilities. We take the conclusion obtained by CoT as one of the conditions for solving the original problem. By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score. Experimental results demonstrate that the proposed method can improve the reasoning performance on various arithmetic, commonsense, and logical reasoning datasets. Our code is publicly available at: https://github.com/WENGSYX/Self-Verification.
△ Less
Submitted 19 October, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems
Authors:
Meiyi Zhu,
Chunyan Feng,
Caili Guo,
Nan Jiang,
Osvaldo Simeone
Abstract:
Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of ob…
▽ More
Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of observations and channels. In this letter, we propose a novel design principle for TBMA based on the information bottleneck (IB). In the proposed IB-TBMA protocol, the shared codebook is jointly optimized with a decoder based on artificial neural networks (ANNs), so as to adapt to source, observations, and channel statistics based on data only. We also introduce the Compressed IB-TBMA (CIB-TBMA) protocol, which improves IB-TBMA by enabling a reduction in the number of codewords via an IB-inspired clustering phase. Numerical results demonstrate the importance of a joint design of codebook and neural decoder, and validate the benefits of codebook compression.
△ Less
Submitted 5 April, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents
Authors:
Minghuan Liu,
Zhengbang Zhu,
Menghui Zhu,
Yuzheng Zhuang,
Weinan Zhang,
Jianye Hao
Abstract:
In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions. This indicates unnecessary re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks. In this paper, we aim to t…
▽ More
In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions. This indicates unnecessary re-training from scratch and considerable sample inefficiency, especially when agents follow similar solution steps to achieve tasks. In this paper, we aim to transfer similar high-level goal-transition knowledge to alleviate the challenge. Specifically, we propose PILoT, i.e., Planning Immediate Landmarks of Targets. PILoT utilizes the universal decoupled policy optimization to learn a goal-conditioned state planner; then, distills a goal-planner to plan immediate landmarks in a model-free style that can be shared among different agents. In our experiments, we show the power of PILoT on various transferring challenges, including few-shot transferring across action spaces and dynamics, from low-dimensional vector states to image inputs, from simple robot to complicated morphology; and we also illustrate a zero-shot transfer solution from a simple 2D navigation task to the harder Ant-Maze task.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Reliable extrapolation of deep neural operators informed by physics or sparse observations
Authors:
Min Zhu,
Handi Zhang,
Anran Jiao,
George Em Karniadakis,
Lu Lu
Abstract:
Deep neural operators can learn nonlinear map**s between infinite-dimensional function spaces via deep neural networks. As promising surrogate solvers of partial differential equations (PDEs) for real-time prediction, deep neural operators such as deep operator networks (DeepONets) provide a new simulation paradigm in science and engineering. Pure data-driven neural operators and deep learning m…
▽ More
Deep neural operators can learn nonlinear map**s between infinite-dimensional function spaces via deep neural networks. As promising surrogate solvers of partial differential equations (PDEs) for real-time prediction, deep neural operators such as deep operator networks (DeepONets) provide a new simulation paradigm in science and engineering. Pure data-driven neural operators and deep learning models, in general, are usually limited to interpolation scenarios, where new predictions utilize inputs within the support of the training set. However, in the inference stage of real-world applications, the input may lie outside the support, i.e., extrapolation is required, which may result to large errors and unavoidable failure of deep learning models. Here, we address this challenge of extrapolation for deep neural operators. First, we systematically investigate the extrapolation behavior of DeepONets by quantifying the extrapolation complexity via the 2-Wasserstein distance between two function spaces and propose a new behavior of bias-variance trade-off for extrapolation with respect to model capacity. Subsequently, we develop a complete workflow, including extrapolation determination, and we propose five reliable learning methods that guarantee a safe prediction under extrapolation by requiring additional information -- the governing PDEs of the system or sparse new observations. The proposed methods are based on either fine-tuning a pre-trained DeepONet or multifidelity learning. We demonstrate the effectiveness of the proposed framework for various types of parametric PDEs. Our systematic comparisons provide practical guidelines for selecting a proper extrapolation method depending on the available information, desired accuracy, and required inference speed.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Resistance Distances in Simplicial Networks
Authors:
Mingzhe Zhu,
Wanyue Xu,
Zhongzhi Zhang,
Haibin Kan,
Guanrong Chen
Abstract:
It is well known that in many real networks, such as brain networks and scientific collaboration networks, there exist higher-order nonpairwise relations among nodes, i.e., interactions between among than two nodes at a time. This simplicial structure can be described by simplicial complexes and has an important effect on topological and dynamical properties of networks involving such group intera…
▽ More
It is well known that in many real networks, such as brain networks and scientific collaboration networks, there exist higher-order nonpairwise relations among nodes, i.e., interactions between among than two nodes at a time. This simplicial structure can be described by simplicial complexes and has an important effect on topological and dynamical properties of networks involving such group interactions. In this paper, we study analytically resistance distances in iteratively growing networks with higher-order interactions characterized by the simplicial structure that is controlled by a parameter q. We derive exact formulas for interesting quantities about resistance distances, including Kirchhoff index, additive degree-Kirchhoff index, multiplicative degree-Kirchhoff index, as well as average resistance distance, which have found applications in various areas elsewhere. We show that the average resistance distance tends to a q-dependent constant, indicating the impact of simplicial organization on the structural robustness measured by average resistance distance.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
Hitting Times of Random Walks on Edge Corona Product Graphs
Authors:
Mingzhe Zhu,
Wanyue Xu,
Wei Li,
Zhongzhi Zhang,
Haibin Kan
Abstract:
Graph products have been extensively applied to model complex networks with striking properties observed in real-world complex systems. In this paper, we study the hitting times for random walks on a class of graphs generated iteratively by edge corona product. We first derive recursive solutions to the eigenvalues and eigenvectors of the normalized adjacency matrix associated with the graphs. Bas…
▽ More
Graph products have been extensively applied to model complex networks with striking properties observed in real-world complex systems. In this paper, we study the hitting times for random walks on a class of graphs generated iteratively by edge corona product. We first derive recursive solutions to the eigenvalues and eigenvectors of the normalized adjacency matrix associated with the graphs. Based on these results, we further obtain interesting quantities about hitting times of random walks, providing iterative formulas for two-node hitting time, as well as closed-form expressions for the Kemeny's constant defined as a weighted average of hitting times over all node pairs, as well as the arithmetic mean of hitting times of all pairs of nodes.
△ Less
Submitted 12 December, 2022;
originally announced December 2022.
-
All-to-key Attention for Arbitrary Style Transfer
Authors:
Mingrui Zhu,
Xiao He,
Nannan Wang,
Xiaoyu Wang,
Xinbo Gao
Abstract:
Attention-based arbitrary style transfer studies have shown promising performance in synthesizing vivid local style details. They typically use the all-to-all attention mechanism -- each position of content features is fully matched to all positions of style features. However, all-to-all attention tends to generate distorted style patterns and has quadratic complexity, limiting the effectiveness a…
▽ More
Attention-based arbitrary style transfer studies have shown promising performance in synthesizing vivid local style details. They typically use the all-to-all attention mechanism -- each position of content features is fully matched to all positions of style features. However, all-to-all attention tends to generate distorted style patterns and has quadratic complexity, limiting the effectiveness and efficiency of arbitrary style transfer. In this paper, we propose a novel all-to-key attention mechanism -- each position of content features is matched to stable key positions of style features -- that is more in line with the characteristics of style transfer. Specifically, it integrates two newly proposed attention forms: distributed and progressive attention. Distributed attention assigns attention to key style representations that depict the style distribution of local regions; Progressive attention pays attention from coarse-grained regions to fine-grained key positions. The resultant module, dubbed StyA2K, shows extraordinary performance in preserving the semantic structure and rendering consistent style patterns. Qualitative and quantitative comparisons with state-of-the-art methods demonstrate the superior performance of our approach.
△ Less
Submitted 6 April, 2023; v1 submitted 8 December, 2022;
originally announced December 2022.
-
Radio continuum and OH line emission of high-z OH megamaser galaxies
Authors:
Zhongzu Wu,
Yu. V. Sotnikova,
Bo Zhang,
T. Mufakharov,
Ming Zhu,
Peng Jiang,
Yongjun Chen,
Zhiqiang Shen,
Chun Sun,
Hao Peng,
Hong Wu
Abstract:
We present the study of arcsecond scale radio continuum and OH line emission of a sample of known OH megamaser galaxies with $z \geq$ 0.15 using archival Very Large Array (VLA) data. And also the results of our pilot Five hundred meter aperture spherical radio telescope (FAST) observations of 12 of these OHM galaxies. The arcsecond-scale resolution images show that the OH emission is distributed i…
▽ More
We present the study of arcsecond scale radio continuum and OH line emission of a sample of known OH megamaser galaxies with $z \geq$ 0.15 using archival Very Large Array (VLA) data. And also the results of our pilot Five hundred meter aperture spherical radio telescope (FAST) observations of 12 of these OHM galaxies. The arcsecond-scale resolution images show that the OH emission is distributed in one compact structure and spatially associated with radio continuum emission. Furthermore, nearly all the fitted components are likely smaller than the beam size ($\sim$ 1.4"), which indicates that the broad OH line profiles of these sources originated from one masing region or that more components are distributed in sub-arcsec scales. The radio parameters, including brightness temperature, spectral index, and q-index, show no significant differences with the low-redshift OHM galaxies, which have significantly lower OH line luminosities. Because these parameters are indicators of the central power sources (AGN, starburst, or both), our results indicate that the presence of radio AGN in the nuclei may not be essential for the formation of OH emission. Over 1/3 of OHMs in this sample (6/17) show possible variable features likely caused by interstellar scintillation due to small angular sizes. We might underestimate this value because these sources are associated with this sample's highest OH line flux densities. Those with low OH line flux densities might need higher sensitivity observations to study the variabilities. These results support the compact nature of OH maser emission and a starburst origin for the OHMs in our selected sample.
△ Less
Submitted 3 December, 2022;
originally announced December 2022.
-
Microlensing effects of wormholes associated to blackhole spacetimes
Authors:
Ke Gao,
Lei-Hua Liu,
Mian Zhu
Abstract:
In this paper, we investigate the microlensing effects of wormholes associated to black hole spacetimes. Specifically, we work on three typical wormholes (WH): Schwarzschild WH, Kerr WH, and RN WH, as well as their blackhole correspondences. We evaluate the deflection angle upon the second order under weak field approximation using Gauss-Bonnet theorem. Then, we study their magnification with nume…
▽ More
In this paper, we investigate the microlensing effects of wormholes associated to black hole spacetimes. Specifically, we work on three typical wormholes (WH): Schwarzschild WH, Kerr WH, and RN WH, as well as their blackhole correspondences. We evaluate the deflection angle upon the second order under weak field approximation using Gauss-Bonnet theorem. Then, we study their magnification with numerics.We find that a Kerr WH could lead to multi peaks in the magnification with certain parameters in the prograde case, while a Kerr BH predicts one peak. Therefore, the multi-peak feature of can be used to distinguish the Kerr WH from other compact objects. We also find that the magnification of RN BH will be one peak compared to RN WH, in which the magnification of RN WH is negative in some situations. For other cases, the behavior of magnification from wormholes and their corresponding blackholes is similar. Our result may shed new light on exploring compact objects through the microlensing effect.
△ Less
Submitted 8 June, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Parton distribution of intrinsic charm in two dimensional QCD
Authors:
Siwei Hu,
Yu Jia,
Zhewen Mo,
Xiaonu Xiong,
Mingliang Zhu
Abstract:
We present a detailed investigation on the intrinsic charm content in a light meson within the 't Hooft model, namely, the two-dimensional QCD in large $N_c$ limit. The intrinsic charm parton distribution function (PDF) of a light meson, which first arises at order $N_c^{-1}$, is explicitly expressed in terms of the 't Hooft wave functions of the light meson and an infinite towers of excited charm…
▽ More
We present a detailed investigation on the intrinsic charm content in a light meson within the 't Hooft model, namely, the two-dimensional QCD in large $N_c$ limit. The intrinsic charm parton distribution function (PDF) of a light meson, which first arises at order $N_c^{-1}$, is explicitly expressed in terms of the 't Hooft wave functions of the light meson and an infinite towers of excited charmed mesons. We also derive the functional forms from the two-dimensional counterparts of the meson cloud model (MCM) and Brodsky-Hoyer-Peterson-Sakai (BHPS) model. We then make a quantitative comparison between our rigorous results and model predictions. We also study how the profile of the intrinsic charm PDF varies with charm quark mass. The average momentum fraction carried by the charm quark inside a light meson is found to decrease faster than $m_c^{-4}$ with increasing charm quark mass.
△ Less
Submitted 9 February, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Authors:
Kun Cheng,
Xiaodong Cun,
Yong Zhang,
Menghan Xia,
Fei Yin,
Mingrui Zhu,
Xuan Wang,
Jue Wang,
Nannan Wang
Abstract:
We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-r…
▽ More
We present VideoReTalking, a new system to edit the faces of a real-world talking head video according to input audio, producing a high-quality and lip-syncing output video even with a different emotion. Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism. Given a talking-head video, we first modify the expression of each frame according to the same expression template using the expression editing network, resulting in a video with the canonical expression. This video, together with the given audio, is then fed into the lip-sync network to generate a lip-syncing video. Finally, we improve the photo-realism of the synthesized faces through an identity-aware face enhancement network and post-processing. We use learning-based approaches for all three steps and all our modules can be tackled in a sequential pipeline without any user intervention. Furthermore, our system is a generic approach that does not need to be retrained to a specific person. Evaluations on two widely-used datasets and in-the-wild examples demonstrate the superiority of our framework over other state-of-the-art methods in terms of lip-sync accuracy and visual quality.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Packing $1.35\cdot 10^{11}$ rectangles into a unit square
Authors:
Mingliang Zhu,
Antal Joós
Abstract:
It is known that $\sum\limits_{i=1}^{\infty} \frac{1}{i (i+1)} = 1$. In 1968, Meir and Moser asked for finding the smallest $ε$ such that all the rectangles of sizes $1/i \times 1/(i + 1)$ for $i = 1, 2, \ldots$, can be packed into a unit square or a rectangle of area $1 + ε$. In this paper, we show that we can pack the first $1.35\cdot10^{11}$ rectangles into the unit square and give an estimate…
▽ More
It is known that $\sum\limits_{i=1}^{\infty} \frac{1}{i (i+1)} = 1$. In 1968, Meir and Moser asked for finding the smallest $ε$ such that all the rectangles of sizes $1/i \times 1/(i + 1)$ for $i = 1, 2, \ldots$, can be packed into a unit square or a rectangle of area $1 + ε$. In this paper, we show that we can pack the first $1.35\cdot10^{11}$ rectangles into the unit square and give an estimate for $ε$ from this packing.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
UAV Assisted Data Collection for Internet of Things: A Survey
Authors:
Zhiqing Wei,
Mingyue Zhu,
Ning Zhang,
Lin Wang,
Yingying Zou,
Zeyang Meng,
Huici Wu,
Zhiyong Feng
Abstract:
Thanks to the advantages of flexible deployment and high mobility, unmanned aerial vehicles (UAVs) have been widely applied in the areas of disaster management, agricultural plant protection, environment monitoring and so on. With the development of UAV and sensor technologies, UAV assisted data collection for Internet of Things (IoT) has attracted increasing attentions. In this article, the scena…
▽ More
Thanks to the advantages of flexible deployment and high mobility, unmanned aerial vehicles (UAVs) have been widely applied in the areas of disaster management, agricultural plant protection, environment monitoring and so on. With the development of UAV and sensor technologies, UAV assisted data collection for Internet of Things (IoT) has attracted increasing attentions. In this article, the scenarios and key technologies of UAV assisted data collection are comprehensively reviewed. First, we present the system model including the network model and mathematical model of UAV assisted data collection for IoT. Then, we review the key technologies including clustering of sensors, UAV data collection mode as well as joint path planning and resource allocation. Finally, the open problems are discussed from the perspectives of efficient multiple access as well as joint sensing and data collection. This article hopefully provides some guidelines and insights for researchers in the area of UAV assisted data collection for IoT.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Tremendous tunneling magnetoresistance effects based on van der Waals room-temperature ferromagnet Fe$_3$GaTe$_2$ with highly spin-polarized Fermi surfaces
Authors:
Xinlu Li,
Meng Zhu,
Yaoyuan Wang,
Fanxing Zheng,
Jianting Dong,
Ye Zhou,
Long You,
Jia Zhang
Abstract:
Recently, van der Waals (vdW) magnetic heterostructures have received increasing research attention in spintronics. However, the lack of room-temperature magnetic order of vdW material has largely impedes its development in practical spintronics devices. Inspired by the recently discovered vdW ferromagnet Fe3GaTe2, which has been shown to have magnetic order above room temperature and sizable perp…
▽ More
Recently, van der Waals (vdW) magnetic heterostructures have received increasing research attention in spintronics. However, the lack of room-temperature magnetic order of vdW material has largely impedes its development in practical spintronics devices. Inspired by the recently discovered vdW ferromagnet Fe3GaTe2, which has been shown to have magnetic order above room temperature and sizable perpendicular magnetic anisotropy, we investigate the basic electronic structure and magnetic properties of Fe3GaTe2 as well as tunneling magnetoresistance effect in magnetic tunnel junctions (MTJs) with structure of Fe3GaTe2/Insulator/Fe3GaTe2 by using first-principles calculations. It is found that Fe3GaTe2 with highly spin-polarized Fermi surface ensures that such magnetic tunnel junctions may have prominent tunneling magnetoresistance effect at room temperature even comparable to existing conventional AlOx and MgO-based MTJs. Our results suggest that Fe3GaTe2-based MTJs may be the promising candidate for realizing long-waiting full magnetic vdW spintronic devices.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
iNavFIter-M: Matrix Formulation of Functional Iteration for Inertial Navigation Computation
Authors:
Hongyan Jiang,
Maoran Zhu,
Yanyan Fu,
Yuanxin Wu
Abstract:
The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper…
▽ More
The acquisition of attitude, velocity, and position is an essential task in the field of inertial navigation, achieved by integrating the measurements from inertial sensors. Recently, the ultra-precision inertial navigation computation has been tackled by the functional iteration approach (iNavFIter) that drives the non-commutativity errors almost to the computer truncation error level. This paper proposes a computationally efficient matrix formulation of the functional iteration approach, named the iNavFIter-M. The Chebyshev polynomial coefficients in two consecutive iterations are explicitly connected through the matrix formulation, in contrast to the implicit iterative relationship in the original iNavFIter. By so doing, it allows a straightforward algorithmic implementation and a number of matrix factors can be pre-calculated for more efficient computation. Numerical results demonstrate that the proposed iNavFIter-M algorithm is able to achieve the same high computation accuracy as the original iNavFIter does, at the computational cost comparable to the typical two-sample algorithm. The iNavFIter-M algorithm is also implemented on a FPGA board to demonstrate its potential in real time applications.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Qafny: A Quantum-Program Verifier
Authors:
Liyi Li,
Mingwei Zhu,
Rance Cleaveland,
Alexander Nicolellis,
Yi Lee,
Le Chang,
Xiaodi Wu
Abstract:
Because of the probabilistic/nondeterministic behavior of quantum programs, it is highly advisable to verify them formally to ensure that they correctly implement their specifications. Formal verification, however, also traditionally requires significant effort. To address this challenge, we present Qafny, an automated proof system based on the program verifier Dafny and designed for verifying qua…
▽ More
Because of the probabilistic/nondeterministic behavior of quantum programs, it is highly advisable to verify them formally to ensure that they correctly implement their specifications. Formal verification, however, also traditionally requires significant effort. To address this challenge, we present Qafny, an automated proof system based on the program verifier Dafny and designed for verifying quantum programs. At its core, Qafny uses a type-guided quantum proof system that translates quantum operations to classical array operations modeled within a classical separation logic framework. We prove the soundness and completeness of our proof system and implement a prototype compiler that transforms Qafny programs and specifications into Dafny for automated verification purposes. We then illustrate the utility of Qafny's automated capabilities in efficiently verifying important quantum algorithms, including quantum-walk algorithms, Grover's algorithm, and Shor's algorithm.
△ Less
Submitted 19 January, 2024; v1 submitted 11 November, 2022;
originally announced November 2022.
-
FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning
Authors:
Xiao-Yang Liu,
Ziyi Xia,
**gyang Rui,
Jiechao Gao,
Hongyang Yang,
Ming Zhu,
Christina Dan Wang,
Zhaoran Wang,
Jian Guo
Abstract:
Finance is a particularly difficult playground for deep reinforcement learning. However, establishing high-quality market environments and benchmarks for financial reinforcement learning is challenging due to three major factors, namely, low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting in the backtesting stage. In this paper, we present an op…
▽ More
Finance is a particularly difficult playground for deep reinforcement learning. However, establishing high-quality market environments and benchmarks for financial reinforcement learning is challenging due to three major factors, namely, low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting in the backtesting stage. In this paper, we present an openly accessible FinRL-Meta library that has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we will provide hundreds of market environments through an automatic pipeline that collects dynamic datasets from real-world markets and processes them into gym-style market environments. Second, we reproduce popular papers as step** stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, FinRL-Meta provides tens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. FinRL-Meta is available at: https://github.com/AI4Finance-Foundation/FinRL-Meta
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
An Efficient FPGA-based Accelerator for Deep Forest
Authors:
Mingyu Zhu,
Jiapeng Luo,
Wendong Mao,
Zhongfeng Wang
Abstract:
Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware…
▽ More
Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA. Firstly, a delicate node computing unit (NCU) is designed to improve inference speed. Secondly, based on NCU, an efficient architecture and an adaptive dataflow are proposed, in order to alleviate the problem of node computing imbalance in the classification process. Moreover, an optimized storage scheme in this design also improves hardware utilization and power efficiency. The proposed design is implemented on an FPGA board, Intel Stratix V, and it is evaluated by two typical datasets, ADULT and Face Mask Detection. The experimental results show that the proposed design can achieve around 40x speedup compared to that on a 40 cores high performance x86 CPU.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Uniqueness of positive solutions to elliptic equations with the critical exponential growth on the unit disc and its applications
Authors:
Lu Chen,
Guozhen Lu,
Ying Xue,
Maochun Zhu
Abstract:
In this paper, we will solve this uniqueness problem of positive solutions to the following equations of exponential growth: \begin{equation*} \begin{cases} -Δu =λue^{u^2},\quad\quad & x\in B_1\subset \mathbb{R}^2,\\ u>0, & x\in B_1,\ \\ u=0,\quad\quad &x\in \partial B_1, \end{cases} \end{equation*} where $ 0<λ<λ_1(B_1)$ and $λ_1(B_1)$ denotes the first eigenvalue of the operator $-Δ$ with the Dir…
▽ More
In this paper, we will solve this uniqueness problem of positive solutions to the following equations of exponential growth: \begin{equation*} \begin{cases} -Δu =λue^{u^2},\quad\quad & x\in B_1\subset \mathbb{R}^2,\\ u>0, & x\in B_1,\ \\ u=0,\quad\quad &x\in \partial B_1, \end{cases} \end{equation*} where $ 0<λ<λ_1(B_1)$ and $λ_1(B_1)$ denotes the first eigenvalue of the operator $-Δ$ with the Dirichlet boundary in unit disk. Our method relies on delicate and difficult analysis of radial solutions to the above equation and careful asymptotic expansion of solutions near the boundary. This uniqueness result will shed some light on solving the conjecture that maximizers of the Trudinger-Moser inequality on the unit disc are unique. Furthermore, based on this uniqueness result, we develop a new strategy to establish the quantization property of elliptic equations with the critical exponential growth in the balls of hyperbolic spaces, and obtain the multiplicity and non-existence of positive critical points for super-critical Trudinger-Moser functional. Our method for the quantization property and non-existence of the critical points avoids using the complicated blow-up analysis used in the literature. This method can also be applied to study the similar problems in balls of high dimensional Euclidean space $\mathbb{R}^n$ or hyperbolic spaces provided the uniqueness for the corresponding quasilinear elliptic equations with the critical exponential growth is established.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
Controller-Guided Partial Label Consistency Regularization with Unlabeled Data
Authors:
Qian-Wei Wang,
Bowen Zhao,
Mingyan Zhu,
Tianxiang Li,
Zimo Liu,
Shu-Tao Xia
Abstract:
Partial label learning (PLL) learns from training examples each associated with multiple candidate labels, among which only one is valid. In recent years, benefiting from the strong capability of dealing with ambiguous supervision and the impetus of modern data augmentation methods, consistency regularization-based PLL methods have achieved a series of successes and become mainstream. However, as…
▽ More
Partial label learning (PLL) learns from training examples each associated with multiple candidate labels, among which only one is valid. In recent years, benefiting from the strong capability of dealing with ambiguous supervision and the impetus of modern data augmentation methods, consistency regularization-based PLL methods have achieved a series of successes and become mainstream. However, as the partial annotation becomes insufficient, their performances drop significantly. In this paper, we leverage easily accessible unlabeled examples to facilitate the partial label consistency regularization. In addition to a partial supervised loss, our method performs a controller-guided consistency regularization at both the label-level and representation-level with the help of unlabeled data. To minimize the disadvantages of insufficient capabilities of the initial supervised model, we use the controller to estimate the confidence of each current prediction to guide the subsequent consistency regularization. Furthermore, we dynamically adjust the confidence thresholds so that the number of samples of each class participating in consistency regularization remains roughly equal to alleviate the problem of class-imbalance. Experiments show that our method achieves satisfactory performances in more practical situations, and its modules can be applied to existing PLL methods to enhance their capabilities.
△ Less
Submitted 27 February, 2024; v1 submitted 20 October, 2022;
originally announced October 2022.
-
ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains
Authors:
Minjun Zhu,
Yixuan Weng,
Shizhu He,
Kang Liu,
Jun Zhao
Abstract:
The ability of reasoning over evidence has received increasing attention in question answering (QA). Recently, natural language database (NLDB) conducts complex QA in knowledge base with textual evidences rather than structured representations, this task attracts a lot of attention because of the flexibility and richness of textual evidence. However, existing text-based complex question answering…
▽ More
The ability of reasoning over evidence has received increasing attention in question answering (QA). Recently, natural language database (NLDB) conducts complex QA in knowledge base with textual evidences rather than structured representations, this task attracts a lot of attention because of the flexibility and richness of textual evidence. However, existing text-based complex question answering datasets fail to provide explicit reasoning process, while it's important for retrieval effectiveness and reasoning interpretability. Therefore, we present a benchmark \textbf{ReasonChainQA} with explanatory and explicit evidence chains. ReasonChainQA consists of two subtasks: answer generation and evidence chains extraction, it also contains higher diversity for multi-hop questions with varying depths, 12 reasoning types and 78 relations. To obtain high-quality textual evidences for answering complex question. Additional experiment on supervised and unsupervised retrieval fully indicates the significance of ReasonChainQA. Dataset and codes will be made publicly available upon accepted.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Classification by estimating the cumulative distribution function for small data
Authors:
Meng-Xian Zhu,
Yuan-Hai Shao
Abstract:
In this paper, we study the classification problem by estimating the conditional probability function of the given data. Different from the traditional expected risk estimation theory on empirical data, we calculate the probability via Fredholm equation, this leads to estimate the distribution of the data. Based on the Fredholm equation, a new expected risk estimation theory by estimating the cumu…
▽ More
In this paper, we study the classification problem by estimating the conditional probability function of the given data. Different from the traditional expected risk estimation theory on empirical data, we calculate the probability via Fredholm equation, this leads to estimate the distribution of the data. Based on the Fredholm equation, a new expected risk estimation theory by estimating the cumulative distribution function is presented. The main characteristics of the new expected risk estimation is to measure the risk on the distribution of the input space. The corresponding empirical risk estimation is also presented, and an $\varepsilon$-insensitive $L_{1}$ cumulative support vector machines ($\varepsilon$-$L_{1}VSVM$) is proposed by introducing an insensitive loss. It is worth mentioning that the classification models and the classification evaluation indicators based on the new mechanism are different from the traditional one. Experimental results show the effectiveness of the proposed $\varepsilon$-$L_{1}VSVM$ and the corresponding cumulative distribution function indicator on validity and interpretability of small data classification.
△ Less
Submitted 12 October, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Learning Critical Scenarios in Feedback Control Systems for Automated Driving
Authors:
Mengjia Zhu,
Alberto Bemporad,
Maximilian Kneissl,
Hasan Esen
Abstract:
Testing is essential for verifying and validating control designs, especially in safety-critical applications. In particular, the control system governing an automated driving vehicle must be proven reliable enough for its acceptance on the market. Recently, much research has focused on scenario-based methods. However, the number of possible driving scenarios to test is in principle infinite. In t…
▽ More
Testing is essential for verifying and validating control designs, especially in safety-critical applications. In particular, the control system governing an automated driving vehicle must be proven reliable enough for its acceptance on the market. Recently, much research has focused on scenario-based methods. However, the number of possible driving scenarios to test is in principle infinite. In this paper, we formalize a learning-based optimization framework to generate corner test-cases, where we take into account the operational design domain. We examine the approach on the case of a feedback control system for automated driving, for which we suggest the design of the objective function expressing the criticality of scenarios. Numerical tests on two logical scenarios of the case study demonstrate that the approach can identify critical scenarios within a limited number of closed-loop experiments.
△ Less
Submitted 8 September, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
Enhanced Effective Aperture Distribution Function for Characterizing Large-Scale Antenna Arrays
Authors:
Xuesong Cai,
Meifang Zhu,
Aleksei Fedorov,
Fredrik Tufvesson
Abstract:
Accurate characterization of large-scale antenna arrays is growing in importance and complexity for the fifth-generation (5G) and beyond systems, as they feature more antenna elements and require increased overall performance. The full 3D patterns of all antenna elements in the array need to be characterized because they are in general different due to construction inaccuracy, coupling, antenna ar…
▽ More
Accurate characterization of large-scale antenna arrays is growing in importance and complexity for the fifth-generation (5G) and beyond systems, as they feature more antenna elements and require increased overall performance. The full 3D patterns of all antenna elements in the array need to be characterized because they are in general different due to construction inaccuracy, coupling, antenna array's asymmetry, etc. The effective aperture distribution function (EADF) can provide an analytic description of an antenna array based on a full-sphere measurement of the array in an anechoic chamber. However, as the array aperture increases, denser spatial samples are needed for EADF due to large distance offsets of array elements from the reference point in the anechoic chamber, leading to a prohibitive measurement time and increased complexity of EADF. In this paper, we present the EADF applied to large-scale arrays and highlight issues caused by the large array aperture. To overcome the issues, an enhanced EADF is proposed with a low complexity that is intrinsically determined by the characteristic of each array element rather than the array aperture. The enhanced EADF is validated using experimental measurements conducted at 27-30 GHz frequency band with a relatively large planar array.
△ Less
Submitted 7 June, 2023; v1 submitted 23 September, 2022;
originally announced September 2022.
-
Multi-Robot-Assisted Human Crowd Evacuation using Navigation Velocity Fields
Authors:
Tongjia Zheng,
Zhenyuan Yuan,
Mollik Nayyar,
Alan R. Wagner,
Minghui Zhu,
Hai Lin
Abstract:
This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evac…
▽ More
This work studies a robot-assisted crowd evacuation problem where we control a small group of robots to guide a large human crowd to safe locations. The challenge lies in how to model human-robot interactions and design robot controls to indirectly control a human population that significantly outnumbers the robots. To address the challenge, we treat the crowd as a continuum and formulate the evacuation objective as driving the crowd density to target locations. We propose a novel mean-field model which consists of a family of microscopic equations that explicitly model how human motions are locally guided by the robots and an associated macroscopic equation that describes how the crowd density is controlled by the navigation velocity fields generated by all robots. Then, we design density feedback controllers for the robots to dynamically adjust their states such that the generated navigation velocity fields drive the crowd density to a target density. Stability guarantees of the proposed controllers are proven. Agent-based simulations are included to evaluate the proposed evacuation algorithms.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
VS-CAM: Vertex Semantic Class Activation Map** to Interpret Vision Graph Neural Network
Authors:
Zhenpeng Feng,
Xiyang Cui,
Hongbing Ji,
Mingzhe Zhu,
Ljubisa Stankovic
Abstract:
Graph convolutional neural network (GCN) has drawn increasing attention and attained good performance in various computer vision tasks, however, there lacks a clear interpretation of GCN's inner mechanism. For standard convolutional neural networks (CNNs), class activation map** (CAM) methods are commonly used to visualize the connection between CNN's decision and image region by generating a he…
▽ More
Graph convolutional neural network (GCN) has drawn increasing attention and attained good performance in various computer vision tasks, however, there lacks a clear interpretation of GCN's inner mechanism. For standard convolutional neural networks (CNNs), class activation map** (CAM) methods are commonly used to visualize the connection between CNN's decision and image region by generating a heatmap. Nonetheless, such heatmap usually exhibits semantic-chaos when these CAMs are applied to GCN directly. In this paper, we proposed a novel visualization method particularly applicable to GCN, Vertex Semantic Class Activation Map** (VS-CAM). VS-CAM includes two independent pipelines to produce a set of semantic-probe maps and a semantic-base map, respectively. Semantic-probe maps are used to detect the semantic information from semantic-base map to aggregate a semantic-aware heatmap. Qualitative results show that VS-CAM can obtain heatmaps where the highlighted regions match the objects much more precisely than CNN-based CAM. The quantitative evaluation further demonstrates the superiority of VS-CAM.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Wave analysis in the complex Fourier transform domain: A new method to obtain the Green's functions of dispersive linear partial differential equations
Authors:
Minjiang Zhu
Abstract:
This paper provides a new analytical method to obtain Green's functions of linear dispersive partial differential equations. The Euler-Bernoulli beam equation and the one-dimensional heat conduction equation (dissipation equation) under impulses in space and time are solved as examples. The complex infinite-domain Green's function of the Euler-Bernoulli beam is derived. A new approach is proposed…
▽ More
This paper provides a new analytical method to obtain Green's functions of linear dispersive partial differential equations. The Euler-Bernoulli beam equation and the one-dimensional heat conduction equation (dissipation equation) under impulses in space and time are solved as examples. The complex infinite-domain Green's function of the Euler-Bernoulli beam is derived. A new approach is proposed to obtain the finite-domain Green's function from the infinite-domain Green's function by the reflection and transmission analysis in the complex Fourier transform domain. It is found that the solution obtained by this approach converges much better at short response times compared with that obtained by the traditional modal analysis. Besides, by applying the geometric summation formula for matrix series, a new modal expansion solution requiring no calculation of each mode's inner product is derived, which analytically proves the wave-mode duality and simplifies the calculation. The semi-infinite-domain cases and the coupled-domain cases are also derived by the newly developed method to show its validity and simplicity. It is found that the non-propagating waves also possess wave speed, and heat conduction can also be treated as propagating waves
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
On Relaxed Locally Decodable Codes for Hamming and Insertion-Deletion Errors
Authors:
Alex Block,
Jeremiah Blocki,
Kuan Cheng,
Elena Grigorescu,
Xin Li,
Yu Zheng,
Minshen Zhu
Abstract:
Locally Decodable Codes (LDCs) are error-correcting codes $C:Σ^n\rightarrow Σ^m$ with super-fast decoding algorithms. They are important mathematical objects in many areas of theoretical computer science, yet the best constructions so far have codeword length $m$ that is super-polynomial in $n$, for codes with constant query complexity and constant alphabet size. In a very surprising result, Ben-S…
▽ More
Locally Decodable Codes (LDCs) are error-correcting codes $C:Σ^n\rightarrow Σ^m$ with super-fast decoding algorithms. They are important mathematical objects in many areas of theoretical computer science, yet the best constructions so far have codeword length $m$ that is super-polynomial in $n$, for codes with constant query complexity and constant alphabet size. In a very surprising result, Ben-Sasson et al. showed how to construct a relaxed version of LDCs (RLDCs) with constant query complexity and almost linear codeword length over the binary alphabet, and used them to obtain significantly-improved constructions of Probabilistically Checkable Proofs. In this work, we study RLDCs in the standard Hamming-error setting, and introduce their variants in the insertion and deletion (Insdel) error setting. Insdel LDCs were first studied by Ostrovsky and Paskin-Cherniavsky, and are further motivated by recent advances in DNA random access bio-technologies, in which the goal is to retrieve individual files from a DNA storage database. Our first result is an exponential lower bound on the length of Hamming RLDCs making 2 queries, over the binary alphabet. This answers a question explicitly raised by Gur and Lachish. Our result exhibits a "phase-transition"-type behavior on the codeword length for constant-query Hamming RLDCs. We further define two variants of RLDCs in the Insdel-error setting, a weak and a strong version. On the one hand, we construct weak Insdel RLDCs with with parameters matching those of the Hamming variants. On the other hand, we prove exponential lower bounds for strong Insdel RLDCs. These results demonstrate that, while these variants are equivalent in the Hamming setting, they are significantly different in the insdel setting. Our results also prove a strict separation between Hamming RLDCs and Insdel RLDCs.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Black-box Dataset Ownership Verification via Backdoor Watermarking
Authors:
Yiming Li,
Mingyan Zhu,
Xue Yang,
Yong Jiang,
Tao Wei,
Shu-Tao Xia
Abstract:
Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existi…
▽ More
Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. We also provide some theoretical analyses of our methods. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at \url{https://github.com/THUYimingLi/DVBW}.
△ Less
Submitted 30 March, 2023; v1 submitted 4 August, 2022;
originally announced September 2022.
-
Evolution of Galaxy Types and HI Gas in Hickson Compact Groups
Authors:
Yao Liu,
Ming Zhu
Abstract:
Compact groups have high galaxy densities and low velocity dispersions, and their group members have experienced numerous and frequent interactions during their lifetimes. They provide a unique environment to study the evolution of galaxies. We examined the galaxies types and HI contents in groups to make a study on the galaxy evolution in compact groups. We used the group crossing time as an age…
▽ More
Compact groups have high galaxy densities and low velocity dispersions, and their group members have experienced numerous and frequent interactions during their lifetimes. They provide a unique environment to study the evolution of galaxies. We examined the galaxies types and HI contents in groups to make a study on the galaxy evolution in compact groups. We used the group crossing time as an age indicator for galaxy groups. Our sample is derived from the Hickson Compact Group catalog. We obtained group morphology data from the Hyper-Leda database and the IR classification based on Wide-Field Infrared Survey Explorer (WISE) fluxes from Zucker et al. (2016). By cross-matching the latest released ALFALFA 100% HI source catalog and supplemented by data found in literature, we obtained 40 galaxy groups with HI data available. We confirmed that the weak correlation between HI mass fraction and group crossing time found by Ai & Zhu (2018) in SDSS groups also exists in compact groups. We also found that the group spiral galaxy fraction is correlated with the group crossing time, but the actively star-forming galaxy fraction is not correlated with the group crossing time. These results seem to fit with the hypothesis that the sequential acquisition of neighbors from surrounding larger-scale structures has affected the morphology transition and star formation efficiency in compact groups.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
Ionization Induced by the Ponderomotive Force in Intense and High-Frequency Laser Fields
Authors:
Mingyu Zhu,
Yuxiang Liu,
Chunli Wei,
Hongcheng Ni,
Qi Wei
Abstract:
Atomic stabilization is a universal phenomenon that occurs when atoms interact with intense and high-frequency laser fields. In this work, we systematically study the influence of the ponderomotive (PM) force, present around the laser focus, on atomic stabilization. We show that the PM force could induce tunneling and even over-barrier ionization to the otherwise stabilized atoms. Such effect may…
▽ More
Atomic stabilization is a universal phenomenon that occurs when atoms interact with intense and high-frequency laser fields. In this work, we systematically study the influence of the ponderomotive (PM) force, present around the laser focus, on atomic stabilization. We show that the PM force could induce tunneling and even over-barrier ionization to the otherwise stabilized atoms. Such effect may overweight the typical multiphoton ionization under moderate laser intensities. Our work highlights the importance of an improved treatment of atomic stabilization that includes the influence of the PM force.
△ Less
Submitted 5 May, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Learning Instrumental Variable from Data Fusion for Treatment Effect Estimation
Authors:
Anpeng Wu,
Kun Kuang,
Ruoxuan Xiong,
Minqing Zhu,
Yuxuan Liu,
Bo Li,
Furui Liu,
Zhihua Wang,
Fei Wu
Abstract:
The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatmen…
▽ More
The advent of the big data era brought new opportunities and challenges to draw treatment effect in data fusion, that is, a mixed dataset collected from multiple sources (each source with an independent treatment assignment mechanism). Due to possibly omitted source labels and unmeasured confounders, traditional methods cannot estimate individual treatment assignment probability and infer treatment effect effectively. Therefore, we propose to reconstruct the source label and model it as a Group Instrumental Variable (GIV) to implement IV-based Regression for treatment effect estimation. In this paper, we conceptualize this line of thought and develop a unified framework (Meta-EM) to (1) map the raw data into a representation space to construct Linear Mixed Models for the assigned treatment variable; (2) estimate the distribution differences and model the GIV for the different treatment assignment mechanisms; and (3) adopt an alternating training strategy to iteratively optimize the representations and the joint distribution to model GIV for IV regression. Empirical results demonstrate the advantages of our Meta-EM compared with state-of-the-art methods.
△ Less
Submitted 7 December, 2022; v1 submitted 23 August, 2022;
originally announced August 2022.
-
How a Small Amount of Data Sharing Benefits Distributed Optimization and Learning
Authors:
Mingxi Zhu,
Yinyu Ye
Abstract:
Distributed optimization algorithms have been widely used in machine learning. While those algorithms have the merits in parallel processing and protecting data security, they often suffer from slow convergence. This paper focuses on how a small amount of data sharing could benefit distributed optimization and learning. Specifically, we examine higher-order optimization algorithms including distri…
▽ More
Distributed optimization algorithms have been widely used in machine learning. While those algorithms have the merits in parallel processing and protecting data security, they often suffer from slow convergence. This paper focuses on how a small amount of data sharing could benefit distributed optimization and learning. Specifically, we examine higher-order optimization algorithms including distributed multi-block alternating direction method of multipliers (ADMM) and preconditioned conjugate gradient method (PCG). The contribution of this paper is three-folded. First, in theory, we answer when and why distributed optimization algorithms are slow by identifying the worst data structure. Surprisingly, while PCG algorithm converges slowly under heterogeneous data structure, for distributed ADMM, data homogeneity leads to the worst performance. This result challenges the common belief that data heterogeneity hurts convergence, highlighting the need for a universal approach on altering data structure for different algorithms. Second, in practice, we propose a meta-algorithm of data sharing, with its tailored applications in multi-block ADMM and PCG methods. By only sharing a small amount of prefixed data (e.g. 1%), our algorithms provide good quality estimators in different machine learning tasks within much fewer iterations, while purely distributed optimization algorithms may take hundreds more times of iterations to converge. Finally, in philosophy, we argue that even minimal collaboration can have huge synergy, which is a concept that extends beyond the realm of optimization analysis. We hope that the discovery resulting from this paper would encourage even a small amount of data sharing among different regions to combat difficult global learning problems.
△ Less
Submitted 2 January, 2024; v1 submitted 20 August, 2022;
originally announced August 2022.