-
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback
Authors:
Sanghyeon Na,
Yonggyu Kim,
Hyunjoon Lee
Abstract:
The generation of high-quality human images through text-to-image (T2I) methods is a significant yet challenging task. Distinct from general image generation, human image synthesis must satisfy stringent criteria related to human pose, anatomy, and alignment with textual prompts, making it particularly difficult to achieve realistic results. Recent advancements in T2I generation based on diffusion…
▽ More
The generation of high-quality human images through text-to-image (T2I) methods is a significant yet challenging task. Distinct from general image generation, human image synthesis must satisfy stringent criteria related to human pose, anatomy, and alignment with textual prompts, making it particularly difficult to achieve realistic results. Recent advancements in T2I generation based on diffusion models have shown promise, yet challenges remain in meeting human-specific preferences. In this paper, we introduce a novel approach tailored specifically for human image generation utilizing Direct Preference Optimization (DPO). Specifically, we introduce an efficient method for constructing a specialized DPO dataset for training human image generation models without the need for costly human feedback. We also propose a modified loss function that enhances the DPO training process by minimizing artifacts and improving image fidelity. Our method demonstrates its versatility and effectiveness in generating human images, including personalized text-to-image generation. Through comprehensive evaluations, we show that our approach significantly advances the state of human image generation, achieving superior results in terms of natural anatomies, poses, and text-image alignment.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Frame Quantization of Neural Networks
Authors:
Wojciech Czaja,
Sanghoon Na
Abstract:
We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory. Specifically, we use first-order Sigma-Delta ($ΣΔ$) quantization for finite unit-norm tight frames to quantize weight matrices and biases in a neural network. In our scenario, we derive an error bound between the original neural network and the quantized neural network in terms of…
▽ More
We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory. Specifically, we use first-order Sigma-Delta ($ΣΔ$) quantization for finite unit-norm tight frames to quantize weight matrices and biases in a neural network. In our scenario, we derive an error bound between the original neural network and the quantized neural network in terms of step size and the number of frame elements. We also demonstrate how to leverage the redundancy of frames to achieve a quantized neural network with higher accuracy.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Efficient Denoising using Score Embedding in Score-based Diffusion Models
Authors:
Andrew S. Na,
William Gao,
Justin W. L. Wan
Abstract:
It is well known that training a denoising score-based diffusion models requires tens of thousands of epochs and a substantial number of image data to train the model. In this paper, we propose to increase the efficiency in training score-based diffusion models. Our method allows us to decrease the number of epochs needed to train the diffusion model. We accomplish this by solving the log-density…
▽ More
It is well known that training a denoising score-based diffusion models requires tens of thousands of epochs and a substantial number of image data to train the model. In this paper, we propose to increase the efficiency in training score-based diffusion models. Our method allows us to decrease the number of epochs needed to train the diffusion model. We accomplish this by solving the log-density Fokker-Planck (FP) Equation numerically to compute the score \textit{before} training. The pre-computed score is embedded into the image to encourage faster training under slice Wasserstein distance. Consequently, it also allows us to decrease the number of images we need to train the neural network to learn an accurate score. We demonstrate through our numerical experiments the improved performance of our proposed method compared to standard score-based diffusion models. Our proposed method achieves a similar quality to the standard method meaningfully faster.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
GPFL: A Gradient Projection-Based Client Selection Framework for Efficient Federated Learning
Authors:
Shijie Na,
Yuzhi Liang,
Siu-Ming Yiu
Abstract:
Federated learning client selection is crucial for determining participant clients while balancing model accuracy and communication efficiency. Existing methods have limitations in handling data heterogeneity, computational burdens, and independent client treatment. To address these challenges, we propose GPFL, which measures client value by comparing local and global descent directions. We also e…
▽ More
Federated learning client selection is crucial for determining participant clients while balancing model accuracy and communication efficiency. Existing methods have limitations in handling data heterogeneity, computational burdens, and independent client treatment. To address these challenges, we propose GPFL, which measures client value by comparing local and global descent directions. We also employ an Exploit-Explore mechanism to enhance performance. Experimental results on FEMINST and CIFAR-10 datasets demonstrate that GPFL outperforms baselines in Non-IID scenarios, achieving over 9\% improvement in FEMINST test accuracy. Moreover, GPFL exhibits shorter computation times through pre-selection and parameter reuse in federated learning.
△ Less
Submitted 26 May, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Distributed Sequential Quadratic Programming with Overlap** Graph Decomposition and Exact Augmented Lagrangian
Authors:
Runxin Ni,
Sen Na,
Sungho Shin,
Mihai Anitescu
Abstract:
In this paper, we address the challenge of solving large-scale graph-structured nonlinear programs (gsNLPs) in a scalable manner. GsNLPs are problems in which the objective and constraint functions are associated with nodes on a graph and depend on the variables of adjacent nodes. This graph-structured formulation encompasses various specific instances, such as dynamic optimization, PDE-constraine…
▽ More
In this paper, we address the challenge of solving large-scale graph-structured nonlinear programs (gsNLPs) in a scalable manner. GsNLPs are problems in which the objective and constraint functions are associated with nodes on a graph and depend on the variables of adjacent nodes. This graph-structured formulation encompasses various specific instances, such as dynamic optimization, PDE-constrained optimization, multistage stochastic optimization, and general network optimization. By leveraging the sequential quadratic programming (SQP) framework, we propose a globally convergent overlap** graph decomposition method to solve large-scale gsNLPs under standard mild regularity conditions on the graph topology. In each iteration, we perform an overlap** graph decomposition to compute an approximate Newton direction in a parallel environment. Then, we select a suitable stepsize and update the primal-dual iterate by performing a backtracking line search on an exact augmented Lagrangian merit function. Built on the exponential decay of sensitivity of gsNLPs, we show that the approximate Newton direction is a descent direction of the augmented Lagrangian, which leads to global convergence with a local linear convergence rate. In particular, global convergence is achieved for sufficiently large overlaps, and the local linear convergence rate improves exponentially in terms of the overlap size. Our results match existing state-of-the-art guarantees established for dynamic programs (which simply correspond to linear graphs). We validate the theory on a semilinear elliptic PDE-constrained problem.
△ Less
Submitted 11 June, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Segment Any Cell: A SAM-based Auto-prompting Fine-tuning Framework for Nuclei Segmentation
Authors:
Saiyang Na,
Yuzhi Guo,
Feng Jiang,
Hehuan Ma,
Junzhou Huang
Abstract:
In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a k…
▽ More
In the rapidly evolving field of AI research, foundational models like BERT and GPT have significantly advanced language and vision tasks. The advent of pretrain-prompting models such as ChatGPT and Segmentation Anything Model (SAM) has further revolutionized image segmentation. However, their applications in specialized areas, particularly in nuclei segmentation within medical imaging, reveal a key challenge: the generation of high-quality, informative prompts is as crucial as applying state-of-the-art (SOTA) fine-tuning techniques on foundation models. To address this, we introduce Segment Any Cell (SAC), an innovative framework that enhances SAM specifically for nuclei segmentation. SAC integrates a Low-Rank Adaptation (LoRA) within the attention layer of the Transformer to improve the fine-tuning process, outperforming existing SOTA methods. It also introduces an innovative auto-prompt generator that produces effective prompts to guide segmentation, a critical factor in handling the complexities of nuclei segmentation in biomedical imaging. Our extensive experiments demonstrate the superiority of SAC in nuclei segmentation tasks, proving its effectiveness as a tool for pathologists and researchers. Our contributions include a novel prompt generation strategy, automated adaptability for diverse segmentation tasks, the innovative application of Low-Rank Attention Adaptation in SAM, and a versatile framework for semantic segmentation challenges.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Stoichiometry Representation Learning with Polymorphic Crystal Structures
Authors:
Namkyeong Lee,
Heewoong Noh,
Gyoung S. Na,
Tianfan Fu,
Jimeng Sun,
Chanyoung Park
Abstract:
Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it i…
▽ More
Despite the recent success of machine learning (ML) in materials science, its success heavily relies on the structural description of crystal, which is itself computationally demanding and occasionally unattainable. Stoichiometry descriptors can be an alternative approach, which reveals the ratio between elements involved to form a certain compound without any structural information. However, it is not trivial to learn the representations of stoichiometry due to the nature of materials science called polymorphism, i.e., a single stoichiometry can exist in multiple structural forms due to the flexibility of atomic arrangements, inducing uncertainties in representation. To this end, we propose PolySRL, which learns the probabilistic representation of stoichiometry by utilizing the readily available structural information, whose uncertainty reveals the polymorphic structures of stoichiometry. Extensive experiments on sixteen datasets demonstrate the superiority of PolySRL, and analysis of uncertainties shed light on the applicability of PolySRL in real-world material discovery. The source code for PolySRL is available at https://github.com/Namkyeong/PolySRL_AI4Science.
△ Less
Submitted 17 November, 2023;
originally announced December 2023.
-
Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal Transformer
Authors:
Namkyeong Lee,
Heewoong Noh,
Sungwon Kim,
Dongmin Hyun,
Gyoung S. Na,
Chanyoung Park
Abstract:
The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the ge…
▽ More
The density of states (DOS) is a spectral property of crystalline materials, which provides fundamental insights into various characteristics of the materials. While previous works mainly focus on obtaining high-quality representations of crystalline materials for DOS prediction, we focus on predicting the DOS from the obtained representations by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. That is, DOS is not solely determined by the crystalline material but also by the energy levels, which has been neglected in previous works. In this paper, we propose to integrate heterogeneous information obtained from the crystalline materials and the energies via a multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystalline materials and various energy levels for DOS prediction. Moreover, we propose to utilize prompts to guide the model to learn the crystal structural system-specific interactions between crystalline materials and energies. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer.
△ Less
Submitted 22 November, 2023; v1 submitted 24 October, 2023;
originally announced November 2023.
-
MFIM: Megapixel Facial Identity Manipulation
Authors:
Sanghyeon Na
Abstract:
Face swap** is a task that changes a facial identity of a given image to that of another person. In this work, we propose a novel face-swap** framework called Megapixel Facial Identity Manipulation (MFIM). The face-swap** model should achieve two goals. First, it should be able to generate a high-quality image. We argue that a model which is proficient in generating a megapixel image can ach…
▽ More
Face swap** is a task that changes a facial identity of a given image to that of another person. In this work, we propose a novel face-swap** framework called Megapixel Facial Identity Manipulation (MFIM). The face-swap** model should achieve two goals. First, it should be able to generate a high-quality image. We argue that a model which is proficient in generating a megapixel image can achieve this goal. However, generating a megapixel image is generally difficult without careful model design. Therefore, our model exploits pretrained StyleGAN in the manner of GAN-inversion to effectively generate a megapixel image. Second, it should be able to effectively transform the identity of a given image. Specifically, it should be able to actively transform ID attributes (e.g., face shape and eyes) of a given image into those of another person, while preserving ID-irrelevant attributes (e.g., pose and expression). To achieve this goal, we exploit 3DMM that can capture various facial attributes. Specifically, we explicitly supervise our model to generate a face-swapped image with the desirable attributes using 3DMM. We show that our model achieves state-of-the-art performance through extensive experiments. Furthermore, we propose a new operation called ID mixing, which creates a new identity by semantically mixing the identities of several people. It allows the user to customize the new identity.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Compressed Sensing Radar Detectors based on Weighted LASSO
Authors:
Siqi Na,
Yoshiyuki Kabashima,
Takashi Takahashi,
Tianyao Huang,
Yimin Liu,
Xiqin Wang
Abstract:
The compressed sensing (CS) model can represent the signal recovery process of a large number of radar systems. The detection problem of such radar systems has been studied in many pieces of literature through the technology of debiased least absolute shrinkage and selection operator (LASSO). While naive LASSO treats all the entries equally, there are many applications in which prior information v…
▽ More
The compressed sensing (CS) model can represent the signal recovery process of a large number of radar systems. The detection problem of such radar systems has been studied in many pieces of literature through the technology of debiased least absolute shrinkage and selection operator (LASSO). While naive LASSO treats all the entries equally, there are many applications in which prior information varies depending on each entry. Weighted LASSO, in which the weights of the regularization terms are tuned depending on the entry-dependent prior, is proven to be more effective with the prior information by many researchers. In the present paper, existing results obtained by methods of statistical mechanics are utilized to derive the debiased weighted LASSO estimator for randomly constructed row-orthogonal measurement matrices. Based on this estimator, we construct a detector, termed the debiased weighted LASSO detector (DWLD), for CS radar systems and prove its advantages. The threshold of this detector can be calculated by false alarm rate, which yields better detection performance than the naive weighted LASSO detector (NWLD) under the Neyman-Pearson principle. The improvement of the detection performance brought by tuning weights is demonstrated by numerical experiments. With the same false alarm rate, the detection probability of DWLD is obviously higher than those of NWLD and the debiased (non-weighted) LASSO detector (DLD).
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging
Authors:
Rui Cao,
Yilin Luo,
**hua Xu,
Xiaofei Luo,
Ku Geng,
Yousuf Aborahama,
Manxiu Cui,
Samuel Davis,
Shuai Na,
Xin Tong,
Cindy Liu,
Karteek Sastry,
Konstantin Maslov,
Peng Hu,
Yide Zhang,
Li Lin,
Yang Zhang,
Lihong V. Wang
Abstract:
Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav…
▽ More
Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we have engineered a PACT system that features a densely packed hemispherical ultrasonic transducer array with 3072 channels, operating at a central frequency of 1 MHz. This system allows for single-shot 3D imaging at a rate equal to the laser repetition rate, such as 20 Hz. We have achieved a single-shot light penetration depth of approximately 9 cm in chicken breast tissue utilizing a 750 nm laser (withstanding 3295-fold light attenuation and still retaining an SNR of 74) and successfully performed transcranial imaging through an ex vivo human skull using a 1064 nm laser. Moreover, we have proven the capacity of our system to perform single-shot 3D PACT imaging in both tissue phantoms and human subjects. These results suggest that our PACT system is poised to unlock potential for real-time, in vivo transcranial functional imaging in humans.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Shift-Robust Molecular Relational Learning with Causal Substructure
Authors:
Namkyeong Lee,
Kanghoon Yoon,
Gyoung S. Na,
Sein Kim,
Chanyoung Park
Abstract:
Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so,…
▽ More
Recently, molecular relational learning, whose goal is to predict the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. In this work, we propose CMRL that is robust to the distributional shift in molecular relational learning by detecting the core substructure that is causally related to chemical reactions. To do so, we first assume a causal relationship based on the domain knowledge of molecular sciences and construct a structural causal model (SCM) that reveals the relationship between variables. Based on the SCM, we introduce a novel conditional intervention framework whose intervention is conditioned on the paired molecule. With the conditional intervention framework, our model successfully learns from the causal substructure and alleviates the confounding effect of shortcut substructures that are spuriously correlated to chemical reactions. Extensive experiments on various tasks with real-world and synthetic datasets demonstrate the superiority of CMRL over state-of-the-art baseline models. Our code is available at https://github.com/Namkyeong/CMRL.
△ Less
Submitted 20 July, 2023; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching
Authors:
Ilgee Hong,
Sen Na,
Michael W. Mahoney,
Mladen Kolar
Abstract:
We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian…
▽ More
We consider solving equality-constrained nonlinear, nonconvex optimization problems. This class of problems appears widely in a variety of applications in machine learning and engineering, ranging from constrained deep neural networks, to optimal control, to PDE-constrained optimization. We develop an adaptive inexact Newton method for this problem class. In each iteration, we solve the Lagrangian Newton system inexactly via a randomized iterative sketching solver, and select a suitable stepsize by performing line search on an exact augmented Lagrangian merit function. The randomized solvers have advantages over deterministic linear system solvers by significantly reducing per-iteration flops complexity and storage cost, when equipped with suitable sketching matrices. Our method adaptively controls the accuracy of the randomized solver and the penalty parameters of the exact augmented Lagrangian, to ensure that the inexact Newton direction is a descent direction of the exact augmented Lagrangian. This allows us to establish a global almost sure convergence. We also show that a unit stepsize is admissible locally, so that our method exhibits a local linear convergence. Furthermore, we prove that the linear convergence can be strengthened to superlinear convergence if we gradually sharpen the adaptive accuracy condition on the randomized solver. We demonstrate the superior performance of our method on benchmark nonlinear problems in CUTEst test set, constrained logistic regression with data from LIBSVM, and a PDE-constrained problem.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Conditional Graph Information Bottleneck for Molecular Relational Learning
Authors:
Namkyeong Lee,
Dongmin Hyun,
Gyoung S. Na,
Sungwon Kim,
Junseok Lee,
Chanyoung Park
Abstract:
Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Desp…
▽ More
Molecular relational learning, whose goal is to learn the interaction behavior between molecular pairs, got a surge of interest in molecular sciences due to its wide range of applications. Recently, graph neural networks have recently shown great success in molecular relational learning by modeling a molecule as a graph structure, and considering atom-level interactions between two molecules. Despite their success, existing molecular relational learning methods tend to overlook the nature of chemistry, i.e., a chemical compound is composed of multiple substructures such as functional groups that cause distinctive chemical reactions. In this work, we propose a novel relational learning framework, called CGIB, that predicts the interaction behavior between a pair of graphs by detecting core subgraphs therein. The main idea is, given a pair of graphs, to find a subgraph from a graph that contains the minimal sufficient information regarding the task at hand conditioned on the paired graph based on the principle of conditional graph information bottleneck. We argue that our proposed method mimics the nature of chemical reactions, i.e., the core substructure of a molecule varies depending on which other molecule it interacts with. Extensive experiments on various tasks with real-world datasets demonstrate the superiority of CGIB over state-of-the-art baselines. Our code is available at https://github.com/Namkyeong/CGIB.
△ Less
Submitted 9 July, 2023; v1 submitted 28 April, 2023;
originally announced May 2023.
-
Predicting Density of States via Multi-modal Transformer
Authors:
Namkyeong Lee,
Heewoong Noh,
Sungwon Kim,
Dongmin Hyun,
Gyoung S. Na,
Chanyoung Park
Abstract:
The density of states (DOS) is a spectral property of materials, which provides fundamental insights on various characteristics of materials. In this paper, we propose a model to predict the DOS by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. Specifically, we integrate the heterogeneous information obtained from the crystal structure and…
▽ More
The density of states (DOS) is a spectral property of materials, which provides fundamental insights on various characteristics of materials. In this paper, we propose a model to predict the DOS by reflecting the nature of DOS: DOS determines the general distribution of states as a function of energy. Specifically, we integrate the heterogeneous information obtained from the crystal structure and the energies via multi-modal transformer, thereby modeling the complex relationships between the atoms in the crystal structure, and various energy levels. Extensive experiments on two types of DOS, i.e., Phonon DOS and Electron DOS, with various real-world scenarios demonstrate the superiority of DOSTransformer. The source code for DOSTransformer is available at https://github.com/HeewoongNoh/DOSTransformer.
△ Less
Submitted 10 April, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems
Authors:
Yuchen Fang,
Sen Na,
Michael W. Mahoney,
Mladen Kolar
Abstract:
We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to th…
▽ More
We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints. We consider a fully stochastic setting, where at each step a single sample is generated to estimate the objective gradient. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices (i.e., Hessians without modification) in SQP subproblems. As a trust-region method for constrained optimization, our algorithm must address an infeasibility issue -- the linearized equality constraints and trust-region constraints may lead to infeasible SQP subproblems. In this regard, we propose an adaptive relaxation technique to compute the trial step, consisting of a normal step and a tangential step. To control the lengths of these two steps while ensuring a scale-invariant property, we adaptively decompose the trust-region radius into two segments, based on the proportions of the rescaled feasibility and optimality residuals to the rescaled full KKT residual. The normal step has a closed form, while the tangential step is obtained by solving a trust-region subproblem, to which a solution ensuring the Cauchy reduction is sufficient for our study. We establish a global almost sure convergence guarantee for TR-StoSQP, and illustrate its empirical performance on both a subset of problems in the CUTEst test set and constrained logistic regression problems using data from the LIBSVM collection.
△ Less
Submitted 28 January, 2024; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Near-Optimal Performance of Stochastic Predictive Control
Authors:
Sungho Shin,
Sen Na,
Mihai Anitescu
Abstract:
This article presents a performance analysis for stochastic predictive control (SPC) in linear systems with quadratic performance index and additive and multiplicative uncertainties. Under a finite support assumption, the problem can be cast as a finite-dimensional quadratic program, but the problem becomes quickly intractable as the problem size grows exponentially in the horizon length. SPC aims…
▽ More
This article presents a performance analysis for stochastic predictive control (SPC) in linear systems with quadratic performance index and additive and multiplicative uncertainties. Under a finite support assumption, the problem can be cast as a finite-dimensional quadratic program, but the problem becomes quickly intractable as the problem size grows exponentially in the horizon length. SPC aims to compute approximate solutions by solving a sequence of problems with truncated prediction horizons and committing the solution in a receding-horizon fashion. While this approach is widely used in practice, its performance relative to the optimal solution is not well understood. This article reports for the first time a rigorous performance guarantee of SPC: Under stabilizability and detectability conditions, the dynamic regret of SPC is exponentially small in the prediction horizon length, allowing SPC to achieve near-optimal performance at a substantially reduced computational expense.
△ Less
Submitted 8 May, 2023; v1 submitted 16 October, 2022;
originally announced October 2022.
-
Compressed sensing radar detectors under the row-orthogonal design model: a statistical mechanics perspective
Authors:
Siqi Na,
Tianyao Huang,
Yimin Liu,
Takashi Takahashi,
Yoshiyuki Kabashima,
Xiqin Wang
Abstract:
Compressed sensing (CS) model of complex-valued data can represent the signal recovery process of a large amount types of radar systems, especially when the measurement matrix is row-orthogonal. Based on debiased least absolute shrinkage and selection operator (LASSO), detection problem under Gaussian random design model, i.e. the elements of measurement matrix are drawn from Gaussian distribution…
▽ More
Compressed sensing (CS) model of complex-valued data can represent the signal recovery process of a large amount types of radar systems, especially when the measurement matrix is row-orthogonal. Based on debiased least absolute shrinkage and selection operator (LASSO), detection problem under Gaussian random design model, i.e. the elements of measurement matrix are drawn from Gaussian distribution, is studied by literature. However, we find that these approaches are not suitable for row-orthogonal measurement matrices, which are of more practical relevance. In view of statistical mechanics approaches, we provide derivations of more accurate test statistics and thresholds (or p-values) under the row-orthogonal design model, and theoretically analyze the detection performance of the present detector. Such detector can analytically provide the threshold according to given false alarm rate, which is not possible with the conventional CS detector, and the detection performance is proved to be better than that of the traditional LASSO detector. Comparing with other debiased LASSO based detectors, simulation results indicate that the proposed approach can achieve more accurate probability of false alarm when the measurement matrix is row-orthogonal, leading to better detection performance under Neyman-Pearson principle.
△ Less
Submitted 2 October, 2022; v1 submitted 30 September, 2022;
originally announced September 2022.
-
Transcranial photoacoustic computed tomography of human brain function
Authors:
Yang Zhang,
Shuai Na,
Karteekeya Sastry,
Jonathan J. Russin,
Peng Hu,
Li Lin,
Xin Tong,
Kay B. Jann,
Danny J. Wang,
Charles Y. Liu,
Lihong V. Wang
Abstract:
Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving trans…
▽ More
Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving transcranial functional imaging.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming
Authors:
Sen Na,
Michael W. Mahoney
Abstract:
We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions. In each iteration, the StoSQP method computes the Newton direction by solving a quadratic progr…
▽ More
We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Programming (StoSQP) method to solve these problems, which can be regarded as applying second-order Newton's method to the Karush-Kuhn-Tucker (KKT) conditions. In each iteration, the StoSQP method computes the Newton direction by solving a quadratic program, and then selects a proper adaptive stepsize $\barα_t$ to update the primal-dual iterate. To reduce dominant computational cost of the method, we inexactly solve the quadratic program in each iteration by employing an iterative sketching solver. Notably, the approximation error of the sketching solver need not vanish as iterations proceed, meaning that the per-iteration computational cost does not blow up. For the above StoSQP method, we show that under mild assumptions, the rescaled primal-dual sequence $1/\sqrt{\barα_t}\cdot (x_t - x^\star, λ_t - λ^\star)$ converges to a mean-zero Gaussian distribution with a nontrivial covariance matrix depending on the underlying sketching distribution. To perform inference in practice, we also analyze a plug-in covariance matrix estimator. We illustrate the asymptotic normality result of the method both on benchmark nonlinear problems in CUTEst test set and on linearly/nonlinearly constrained regression problems.
△ Less
Submitted 13 April, 2024; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Hessian Averaging in Stochastic Newton Methods Achieves Superlinear Convergence
Authors:
Sen Na,
Michał Dereziński,
Michael W. Mahoney
Abstract:
We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear converge…
▽ More
We consider minimizing a smooth and strongly convex objective function using a stochastic Newton method. At each iteration, the algorithm is given an oracle access to a stochastic estimate of the Hessian matrix. The oracle model includes popular algorithms such as Subsampled Newton and Newton Sketch. Despite using second-order information, these existing methods do not exhibit superlinear convergence, unless the stochastic noise is gradually reduced to zero during the iteration, which would lead to a computational blow-up in the per-iteration cost. We propose to address this limitation with Hessian averaging: instead of using the most recent Hessian estimate, our algorithm maintains an average of all the past estimates. This reduces the stochastic noise while avoiding the computational blow-up. We show that this scheme exhibits local $Q$-superlinear convergence with a non-asymptotic rate of $(Υ\sqrt{\log (t)/t}\,)^{t}$, where $Υ$ is proportional to the level of stochastic noise in the Hessian oracle. A potential drawback of this (uniform averaging) approach is that the averaged estimates contain Hessian information from the global phase of the method, i.e., before the iterates converge to a local neighborhood. This leads to a distortion that may substantially delay the superlinear convergence until long after the local neighborhood is reached. To address this drawback, we study a number of weighted averaging schemes that assign larger weights to recent Hessians, so that the superlinear convergence arises sooner, albeit with a slightly slower rate. Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
△ Less
Submitted 28 November, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring
Authors:
Hee-Jun Jung,
Doyeon Kim,
Seung-Hoon Na,
Kangil Kim
Abstract:
Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in transferring, we investigate distillation of structures of representations specified to three ty…
▽ More
Knowledge distillation is an approach to transfer information on representations from a teacher to a student by reducing their difference. A challenge of this approach is to reduce the flexibility of the student's representations inducing inaccurate learning of the teacher's knowledge. To resolve it in transferring, we investigate distillation of structures of representations specified to three types: intra-feature, local inter-feature, global inter-feature structures. To transfer them, we introduce feature structure distillation methods based on the Centered Kernel Alignment, which assigns a consistent value to similar features structures and reveals more informative relations. In particular, a memory-augmented transfer method with clustering is implemented for the global structures. The methods are empirically analyzed on the nine tasks for language understanding of the GLUE dataset with Bidirectional Encoder Representations from Transformers (BERT), which is a representative neural language model. In the results, the proposed methods effectively transfer the three types of structures and improve performance compared to state-of-the-art distillation methods. Indeed, the code for the methods is available in https://github.com/maroo-sky/FSD.
△ Less
Submitted 27 February, 2023; v1 submitted 1 April, 2022;
originally announced April 2022.
-
A single Long Short-Term Memory network for enhancing the prediction of path-dependent plasticity with material heterogeneity and anisotropy
Authors:
Ehsan Motevali Haghighi,
SeonHong Na
Abstract:
This study presents the applicability of conventional deep recurrent neural networks (RNN) to predict path-dependent plasticity associated with material heterogeneity and anisotropy. Although the architecture of RNN possesses inductive biases toward information over time, it is still challenging to learn the path-dependent material behavior as a function of the loading path considering the change…
▽ More
This study presents the applicability of conventional deep recurrent neural networks (RNN) to predict path-dependent plasticity associated with material heterogeneity and anisotropy. Although the architecture of RNN possesses inductive biases toward information over time, it is still challenging to learn the path-dependent material behavior as a function of the loading path considering the change from elastic to elastoplastic regimes. Our attempt is to develop a simple machine-learning-based model that can replicate elastoplastic behaviors considering material heterogeneity and anisotropy. The basic Long-Short Term Memory Unit (LSTM) is adopted for the modeling of plasticity in the two-dimensional space by enhancing the inductive bias toward the past information through manipulating input variables. Our results find that a single LSTM based model can capture the J2 plasticity responses under both monotonic and arbitrary loading paths provided the material heterogeneity. The proposed neural network architecture is then used to model elastoplastic responses of a two-dimensional transversely anisotropic material associated with computational homogenization (FE2). It is also found that a single LSTM model can be used to accurately and effectively capture the path-dependent responses of heterogeneous and anisotropic microstructures under arbitrary mechanical loading conditions.
△ Less
Submitted 4 April, 2022; v1 submitted 28 March, 2022;
originally announced April 2022.
-
Federated Reinforcement Learning for Collective Navigation of Robotic Swarms
Authors:
Seongin Na,
Tomáš Rouček,
Jiří Ulrich,
Jan Pikman,
Tomáš Krajník,
Barry Lennox,
Farshad Arvin
Abstract:
The recent advancement of Deep Reinforcement Learning (DRL) contributed to robotics by allowing automatic controller design. The automatic controller design is a crucial approach for designing swarm robotic systems, which require more complex controllers than a single robot system to lead a desired collective behaviour. Although the DRL-based controller design method showed its effectiveness, the…
▽ More
The recent advancement of Deep Reinforcement Learning (DRL) contributed to robotics by allowing automatic controller design. The automatic controller design is a crucial approach for designing swarm robotic systems, which require more complex controllers than a single robot system to lead a desired collective behaviour. Although the DRL-based controller design method showed its effectiveness, the reliance on the central training server is a critical problem in real-world environments where robot-server communication is unstable or limited. We propose a novel Federated Learning (FL) based DRL training strategy (FLDDPG) for use in swarm robotic applications. Through the comparison with baseline strategies under a limited communication bandwidth scenario, it is shown that the FLDDPG method resulted in higher robustness and generalisation ability into a different environment and real robots, while the baseline strategies suffer from the limitation of communication bandwidth. This result suggests that the proposed method can benefit swarm robotic systems operating in environments with limited communication bandwidth, e.g., in high-radiation, underwater, or subterranean environments.
△ Less
Submitted 11 September, 2022; v1 submitted 2 February, 2022;
originally announced February 2022.
-
Inequality Constrained Stochastic Nonlinear Optimization via Active-Set Sequential Quadratic Programming
Authors:
Sen Na,
Mihai Anitescu,
Mladen Kolar
Abstract:
We study nonlinear optimization problems with a stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming (StoSQP) algorithm that utilizes a differentiable exact augmented Lagrangian as the…
▽ More
We study nonlinear optimization problems with a stochastic objective and deterministic equality and inequality constraints, which emerge in numerous applications including finance, manufacturing, power systems and, recently, deep neural networks. We propose an active-set stochastic sequential quadratic programming (StoSQP) algorithm that utilizes a differentiable exact augmented Lagrangian as the merit function. The algorithm adaptively selects the penalty parameters of the augmented Lagrangian and performs a stochastic line search to decide the stepsize. The global convergence is established: for any initialization, the KKT residuals converge to zero almost surely. Our algorithm and analysis further develop the prior work of Na et al., (2022). Specifically, we allow nonlinear inequality constraints without requiring the strict complementary condition; refine some of the designs in Na et al., (2022) such as the feasibility error condition and the monotonically increasing sample size; strengthen the global convergence guarantee; and improve the sample complexity on the objective Hessian. We demonstrate the performance of the designed algorithm on a subset of nonlinear problems collected in CUTEst test set and on constrained logistic regression problems.
△ Less
Submitted 30 January, 2023; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Meta-Path-based Fake News Detection Leveraging Multi-level Social Context Information
Authors:
Jian Cui,
Kwanwoo Kim,
Seung Ho Na,
Seungwon Shin
Abstract:
Fake news, false or misleading information presented as news, has a significant impact on many aspects of society, such as in politics or healthcare domains. Due to the deceiving nature of fake news, applying Natural Language Processing (NLP) techniques to the news content alone is insufficient. The multi-level social context information (news publishers and engaged users in social media) and temp…
▽ More
Fake news, false or misleading information presented as news, has a significant impact on many aspects of society, such as in politics or healthcare domains. Due to the deceiving nature of fake news, applying Natural Language Processing (NLP) techniques to the news content alone is insufficient. The multi-level social context information (news publishers and engaged users in social media) and temporal information of user engagement are important information in fake news detection. The proper usage of this information, however, introduces three chronic difficulties: 1) multi-level social context information is hard to be used without information loss, 2) temporal information is hard to be used along with multi-level social context information, 3) news representation with multi-level social context and temporal information is hard to be learned in an end-to-end manner. To overcome all three difficulties, we propose a novel fake news detection framework, Hetero-SCAN. We use Meta-Path to extract meaningful multi-level social context information without loss. Meta-Path, a composite relation connecting two node types, is proposed to capture the semantics in the heterogeneous graph. We then propose Meta-Path instance encoding and aggregation methods to capture the temporal information of user engagement and produce news representation end-to-end. According to our experiment, Hetero-SCAN yields significant performance improvement over state-of-the-art fake news detection methods.
△ Less
Submitted 16 November, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
A Fast Temporal Decomposition Procedure for Long-horizon Nonlinear Dynamic Programming
Authors:
Sen Na,
Mihai Anitescu,
Mladen Kolar
Abstract:
We propose a fast temporal decomposition procedure for solving long-horizon nonlinear dynamic programs. The core of the procedure is sequential quadratic programming (SQP) that utilizes a differentiable exact augmented Lagrangian as the merit function. Within each SQP iteration, we approximately solve the Newton system using an overlap** temporal decomposition strategy. We show that the approxim…
▽ More
We propose a fast temporal decomposition procedure for solving long-horizon nonlinear dynamic programs. The core of the procedure is sequential quadratic programming (SQP) that utilizes a differentiable exact augmented Lagrangian as the merit function. Within each SQP iteration, we approximately solve the Newton system using an overlap** temporal decomposition strategy. We show that the approximate search direction is still a descent direction of the augmented Lagrangian, provided the overlap size and penalty parameters are suitably chosen, which allows us to establish the global convergence. Moreover, we show that a unit stepsize is accepted locally for the approximate search direction, and further establish a uniform, local linear convergence over stages. This local convergence rate matches the rate of the recent Schwarz scheme by Na et al., 2022. However, the Schwarz scheme has to solve nonlinear subproblems to optimality in each iteration, while we only perform a single Newton step instead. Numerical experiments validate our theories and demonstrate the superiority of our method.
△ Less
Submitted 17 April, 2023; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Optical calibration of the SNO+ detector in the water phase with deployed sources
Authors:
SNO+ Collaboration,
:,
M. R. Anderson,
S. Andringa,
M. Askins,
D. J. Auty,
F. Barão,
N. Barros,
R. Bayes,
E. W. Beier,
A. Bialek,
S. D. Biller,
E. Blucher,
M. Boulay,
E. Caden,
E. J. Callaghan,
J. Caravaca,
M. Chen,
O. Chkvorets,
B. Cleveland,
D. Cookman,
J. Corning,
M. A. Cox,
C. Deluce,
M. M. Depatie
, et al. (98 additional authors not shown)
Abstract:
SNO+ is a large-scale liquid scintillator experiment with the primary goal of searching for neutrinoless double beta decay, and is located approximately 2 km underground in SNOLAB, Sudbury, Canada. The detector acquired data for two years as a pure water Cherenkov detector, starting in May 2017. During this period, the optical properties of the detector were measured in situ using a deployed light…
▽ More
SNO+ is a large-scale liquid scintillator experiment with the primary goal of searching for neutrinoless double beta decay, and is located approximately 2 km underground in SNOLAB, Sudbury, Canada. The detector acquired data for two years as a pure water Cherenkov detector, starting in May 2017. During this period, the optical properties of the detector were measured in situ using a deployed light diffusing sphere, with the goal of improving the detector model and the energy response systematic uncertainties. The measured parameters included the water attenuation coefficients, effective attenuation coefficients for the acrylic vessel, and the angular response of the photomultiplier tubes and their surrounding light concentrators, all across different wavelengths. The calibrated detector model was validated using a deployed tagged gamma source, which showed a 0.6% variation in energy scale across the primary target volume.
△ Less
Submitted 4 October, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
The SNO+ Experiment
Authors:
SNO+ Collaboration,
:,
V. Albanese,
R. Alves,
M. R. Anderson,
S. Andringa,
L. Anselmo,
E. Arushanova,
S. Asahi,
M. Askins,
D. J. Auty,
A. R. Back,
S. Back,
F. Barão,
Z. Barnard,
A. Barr,
N. Barros,
D. Bartlett,
R. Bayes,
C. Beaudoin,
E. W. Beier,
G. Berardi,
A. Bialek,
S. D. Biller,
E. Blucher
, et al. (229 additional authors not shown)
Abstract:
The SNO+ experiment is located 2 km underground at SNOLAB in Sudbury, Canada. A low background search for neutrinoless double beta ($0νββ$) decay will be conducted using 780 tonnes of liquid scintillator loaded with 3.9 tonnes of natural tellurium, corresponding to 1.3 tonnes of $^{130}$Te. This paper provides a general overview of the SNO+ experiment, including detector design, construction of pr…
▽ More
The SNO+ experiment is located 2 km underground at SNOLAB in Sudbury, Canada. A low background search for neutrinoless double beta ($0νββ$) decay will be conducted using 780 tonnes of liquid scintillator loaded with 3.9 tonnes of natural tellurium, corresponding to 1.3 tonnes of $^{130}$Te. This paper provides a general overview of the SNO+ experiment, including detector design, construction of process plants, commissioning efforts, electronics upgrades, data acquisition systems, and calibration techniques. The SNO+ collaboration is reusing the acrylic vessel, PMT array, and electronics of the SNO detector, having made a number of experimental upgrades and essential adaptations for use with the liquid scintillator. With low backgrounds and a low energy threshold, the SNO+ collaboration will also pursue a rich physics program beyond the search for $0νββ$ decay, including studies of geo- and reactor antineutrinos, supernova and solar neutrinos, and exotic physics such as the search for invisible nucleon decay. The SNO+ approach to the search for $0νββ$ decay is scalable: a future phase with high $^{130}$Te-loading is envisioned to probe an effective Majorana mass in the inverted mass ordering region.
△ Less
Submitted 25 August, 2021; v1 submitted 23 April, 2021;
originally announced April 2021.
-
An Adaptive Stochastic Sequential Quadratic Programming with Differentiable Exact Augmented Lagrangians
Authors:
Sen Na,
Mihai Anitescu,
Mladen Kolar
Abstract:
We consider solving nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We assume for the objective that its evaluation, gradient, and Hessian are inaccessible, while one can compute their stochastic estimates by, for example, subsampling. We propose a stochastic algorithm based on sequential quadratic programming (SQP) that uses a differentiable exa…
▽ More
We consider solving nonlinear optimization problems with a stochastic objective and deterministic equality constraints. We assume for the objective that its evaluation, gradient, and Hessian are inaccessible, while one can compute their stochastic estimates by, for example, subsampling. We propose a stochastic algorithm based on sequential quadratic programming (SQP) that uses a differentiable exact augmented Lagrangian as the merit function. To motivate our algorithm design, we first revisit and simplify an old SQP method \citep{Lucidi1990Recursive} developed for solving deterministic problems, which serves as the skeleton of our stochastic algorithm. Based on the simplified deterministic algorithm, we then propose a non-adaptive SQP for dealing with stochastic objective, where the gradient and Hessian are replaced by stochastic estimates but the stepsizes are deterministic and prespecified. Finally, we incorporate a recent stochastic line search procedure \citep{Paquette2020Stochastic} into the non-adaptive stochastic SQP to adaptively select the random stepsizes, which leads to an adaptive stochastic SQP. The global "almost sure" convergence for both non-adaptive and adaptive SQP methods is established. Numerical experiments on nonlinear problems in CUTEst test set demonstrate the superiority of the adaptive algorithm.
△ Less
Submitted 6 June, 2022; v1 submitted 10 February, 2021;
originally announced February 2021.
-
First order transition in trigonal structure ${\textbf{Ca}}{\textbf{Mn}}_{2}{\textbf{P}}_{2}$
Authors:
Y. J. Li,
F. **,
Z. Y. Mi,
J. Guo,
W. Wu,
D. S. Wu,
S. H. Na,
C. Mu,
X. B. Zhou,
Z. Li,
K. Liu,
L. L. Sun,
Q. M. Zhang,
T. Xiang,
G. Li,
J. L. Luo
Abstract:
We report structural and physical properties of the single crystalline ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$. The X-ray diffraction(XRD) results show that ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$ adopts the trigonal ${\mathrm{Ca}}{\mathrm{Al}}_{2}{\mathrm{Si}}_{2}$-type structure. Temperature dependent electrical resistivity $ρ(T)$ measurements indicate an insulating ground stat…
▽ More
We report structural and physical properties of the single crystalline ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$. The X-ray diffraction(XRD) results show that ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$ adopts the trigonal ${\mathrm{Ca}}{\mathrm{Al}}_{2}{\mathrm{Si}}_{2}$-type structure. Temperature dependent electrical resistivity $ρ(T)$ measurements indicate an insulating ground state for ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$ with activation energies of 40 meV and 0.64 meV for two distinct regions, respectively. Magnetization measurements show no apparent magnetic phase transition under 400 K. Different from other ${\mathrm{A}}{\mathrm{Mn}}_{2}{\mathrm{Pn}}_{2}$ (A = Ca, Sr, and Ba, and Pn = P, As, and Sb) compounds with the same structure, heat capacity $C_{\mathrm{p}}(T)$ and $ρ(T)$ reveal that ${\mathrm{Ca}}{\mathrm{Mn}}_{2}{\mathrm{P}}_{2}$ has a first-order transition at $T$ = 69.5 K and the transition temperature shifts to high temperature upon increasing pressure. The emergence of plenty of new Raman modes below the transition, clearly suggests a change in symmetry accompanying the transition. The combination of the structural, transport, thermal and magnetic measurements, points to an unusual origin of the transition.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Development, characterisation, and deployment of the SNO+ liquid scintillator
Authors:
SNO+ Collaboration,
:,
M. R. Anderson,
S. Andringa,
L. Anselmo,
E. Arushanova,
S. Asahi,
M. Askins,
D. J. Auty,
A. R. Back,
Z. Barnard,
N. Barros,
D. Bartlett,
F. Barão,
R. Bayes,
E. W. Beier,
A. Bialek,
S. D. Biller,
E. Blucher,
R. Bonventre,
M. Boulay,
D. Braid,
E. Caden,
E. J. Callaghan,
J. Caravaca
, et al. (201 additional authors not shown)
Abstract:
A liquid scintillator consisting of linear alkylbenzene as the solvent and 2,5-diphenyloxazole as the fluor was developed for the SNO+ experiment. This mixture was chosen as it is compatible with acrylic and has a competitive light yield to pre-existing liquid scintillators while conferring other advantages including longer attenuation lengths, superior safety characteristics, chemical simplicity,…
▽ More
A liquid scintillator consisting of linear alkylbenzene as the solvent and 2,5-diphenyloxazole as the fluor was developed for the SNO+ experiment. This mixture was chosen as it is compatible with acrylic and has a competitive light yield to pre-existing liquid scintillators while conferring other advantages including longer attenuation lengths, superior safety characteristics, chemical simplicity, ease of handling, and logistical availability. Its properties have been extensively characterized and are presented here. This liquid scintillator is now used in several neutrino physics experiments in addition to SNO+.
△ Less
Submitted 21 February, 2021; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Facial UV Map Completion for Pose-invariant Face Recognition: A Novel Adversarial Approach based on Coupled Attention Residual UNets
Authors:
In Seop Na,
Chung Tran,
Dung Nguyen,
Sang Dinh
Abstract:
Pose-invariant face recognition refers to the problem of identifying or verifying a person by analyzing face images captured from different poses. This problem is challenging due to the large variation of pose, illumination and facial expression. A promising approach to deal with pose variation is to fulfill incomplete UV maps extracted from in-the-wild faces, then attach the completed UV map to a…
▽ More
Pose-invariant face recognition refers to the problem of identifying or verifying a person by analyzing face images captured from different poses. This problem is challenging due to the large variation of pose, illumination and facial expression. A promising approach to deal with pose variation is to fulfill incomplete UV maps extracted from in-the-wild faces, then attach the completed UV map to a fitted 3D mesh and finally generate different 2D faces of arbitrary poses. The synthesized faces increase the pose variation for training deep face recognition models and reduce the pose discrepancy during the testing phase. In this paper, we propose a novel generative model called Attention ResCUNet-GAN to improve the UV map completion. We enhance the original UV-GAN by using a couple of U-Nets. Particularly, the skip connections within each U-Net are boosted by attention gates. Meanwhile, the features from two U-Nets are fused with trainable scalar weights. The experiments on the popular benchmarks, including Multi-PIE, LFW, CPLWF and CFP datasets, show that the proposed method yields superior performance compared to other existing methods.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Learning Visual Context by Comparison
Authors:
Minchul Kim,
Jongchan Park,
Seil Na,
Chang Min Park,
Donggeun Yoo
Abstract:
Finding diseases from an X-ray image is an important yet highly challenging task. Current methods for solving this task exploit various characteristics of the chest X-ray image, but one of the most important characteristics is still missing: the necessity of comparison between related regions in an image. In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference betwee…
▽ More
Finding diseases from an X-ray image is an important yet highly challenging task. Current methods for solving this task exploit various characteristics of the chest X-ray image, but one of the most important characteristics is still missing: the necessity of comparison between related regions in an image. In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference between an object of interest and its corresponding context. We show that explicit difference modeling can be very helpful in tasks that require direct comparison between locations from afar. This module can be plugged into existing deep learning models. For evaluation, we apply our module to three chest X-ray recognition tasks and COCO object detection & segmentation tasks and observe consistent improvements across tasks. The code is available at https://github.com/mk-minchul/attend-and-compare.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
AEGCN: An Autoencoder-Constrained Graph Convolutional Network
Authors:
Mingyuan Ma,
Sen Na,
Hongyu Wang
Abstract:
We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network, to solve node classification task on graph domains. As suggested by its name, the core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder. Comparing with vanilla graph convolutional networks, the autoencoder step is ad…
▽ More
We propose a novel neural network architecture, called autoencoder-constrained graph convolutional network, to solve node classification task on graph domains. As suggested by its name, the core of this model is a convolutional network operating directly on graphs, whose hidden layers are constrained by an autoencoder. Comparing with vanilla graph convolutional networks, the autoencoder step is added to reduce the information loss brought by Laplacian smoothing. We consider applying our model on both homogeneous graphs and heterogeneous graphs. For homogeneous graphs, the autoencoder approximates to the adjacency matrix of the input graph by taking hidden layer representations as encoder and another one-layer graph convolutional network as decoder. For heterogeneous graphs, since there are multiple adjacency matrices corresponding to different types of edges, the autoencoder approximates to the feature matrix of the input graph instead, and changes the encoder to a particularly designed multi-channel pre-processing network with two layers. In both cases, the error occurred in the autoencoder approximation goes to the penalty term in the loss function. In extensive experiments on citation networks and other heterogeneous graphs, we demonstrate that adding autoencoder constraints significantly improves the performance of graph convolutional networks. Further, we notice that our technique can be applied on graph attention network to improve the performance as well. This reveals the wide applicability of the proposed autoencoder technique.
△ Less
Submitted 10 February, 2021; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Convergence Analysis of Accelerated Stochastic Gradient Descent under the Growth Condition
Authors:
You-Lin Chen,
Sen Na,
Mladen Kolar
Abstract:
We study the convergence of accelerated stochastic gradient descent for strongly convex objectives under the growth condition, which states that the variance of stochastic gradient is bounded by a multiplicative part that grows with the full gradient, and a constant additive part. Through the lens of the growth condition, we investigate four widely used accelerated methods: Nesterov's accelerated…
▽ More
We study the convergence of accelerated stochastic gradient descent for strongly convex objectives under the growth condition, which states that the variance of stochastic gradient is bounded by a multiplicative part that grows with the full gradient, and a constant additive part. Through the lens of the growth condition, we investigate four widely used accelerated methods: Nesterov's accelerated method (NAM), robust momentum method (RMM), accelerated dual averaging method (DAM+), and implicit DAM+ (iDAM+). While these methods are known to improve the convergence rate of SGD under the condition that the stochastic gradient has bounded variance, it is not well understood how their convergence rates are affected by the multiplicative noise. In this paper, we show that these methods all converge to a neighborhood of the optimum with accelerated convergence rates (compared to SGD) even under the growth condition. In particular, NAM, RMM, iDAM+ enjoy acceleration only with a mild multiplicative noise, while DAM+ enjoys acceleration even with a large multiplicative noise. Furthermore, we propose a generic tail-averaged scheme that allows the accelerated rates of DAM+ and iDAM+ to nearly attain the theoretical lower bound (up to a logarithmic factor in the variance term). We conduct numerical experiments to support our theoretical conclusions.
△ Less
Submitted 30 October, 2023; v1 submitted 11 June, 2020;
originally announced June 2020.
-
On the Convergence of Overlap** Schwarz Decomposition for Nonlinear Optimal Control
Authors:
Sen Na,
Sungho Shin,
Mihai Anitescu,
Victor M. Zavala
Abstract:
We study the convergence properties of an overlap** Schwarz decomposition algorithm for solving nonlinear optimal control problems (OCPs). The algorithm decomposes the time domain into a set of overlap** subdomains, and solves all subproblems defined over subdomains in parallel. The convergence is attained by updating primal-dual information at the boundaries of overlap** subdomains. We show…
▽ More
We study the convergence properties of an overlap** Schwarz decomposition algorithm for solving nonlinear optimal control problems (OCPs). The algorithm decomposes the time domain into a set of overlap** subdomains, and solves all subproblems defined over subdomains in parallel. The convergence is attained by updating primal-dual information at the boundaries of overlap** subdomains. We show that the algorithm exhibits local linear convergence, and that the convergence rate improves exponentially with the overlap size. We also establish global convergence results for a general quadratic programming, which enables the application of the Schwarz scheme inside second-order optimization algorithms (e.g., sequential quadratic programming). The theoretical foundation of our convergence analysis is a sensitivity result of nonlinear OCPs, which we call "exponential decay of sensitivity" (EDS). Intuitively, EDS states that the impact of perturbations at domain boundaries (i.e. initial and terminal time) on the solution decays exponentially as one moves into the domain. Here, we expand a previous analysis available in the literature by showing that EDS holds for both primal and dual solutions of nonlinear OCPs, under uniform second-order sufficient condition, controllability condition, and boundedness condition. We conduct experiments with a quadrotor motion planning problem and a PDE control problem to validate our theory; and show that the approach is significantly more efficient than ADMM and as efficient as the centralized solver Ipopt.
△ Less
Submitted 14 March, 2022; v1 submitted 13 May, 2020;
originally announced May 2020.
-
Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees
Authors:
Sen Na,
Yuwei Luo,
Zhuoran Yang,
Zhaoran Wang,
Mladen Kolar
Abstract:
Graph representation learning is a ubiquitous task in machine learning where the goal is to embed each vertex into a low-dimensional vector space. We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution. The bipartite graph is assumed to be generated by a semiparametric e…
▽ More
Graph representation learning is a ubiquitous task in machine learning where the goal is to embed each vertex into a low-dimensional vector space. We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution. The bipartite graph is assumed to be generated by a semiparametric exponential family distribution, whose parametric component is given by the proximity of outputs of two one-layer neural networks, while nonparametric (nuisance) component is the base measure. Neural networks take high-dimensional features as inputs and output embedding vectors. In this setting, the representation learning problem is equivalent to recovering the weight matrices. The main challenges of estimation arise from the nonlinearity of activation functions and the nonparametric nuisance component of the distribution. To overcome these challenges, we propose a pseudo-likelihood objective based on the rank-order decomposition technique and focus on its local geometry. We show that the proposed objective is strongly convex in a neighborhood around the ground truth, so that a gradient descent-based method achieves linear convergence rate. Moreover, we prove that the sample complexity of the problem is linear in dimensions (up to logarithmic factors), which is consistent with parametric Gaussian models. However, our estimator is robust to any model misspecification within the exponential family, which is validated in extensive experiments.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.
-
Measurement of neutron-proton capture in the SNO+ water phase
Authors:
The SNO+ Collaboration,
:,
M. R. Anderson,
S. Andringa,
M. Askins,
D. J. Auty,
N. Barros,
F. Barão,
R. Bayes,
E. W. Beier,
A. Bialek,
S. D. Biller,
E. Blucher,
R. Bonventre,
M. Boulay,
E. Caden,
E. J. Callaghan,
J. Caravaca,
D. Chauhan,
M. Chen,
O. Chkvorets,
B. Cleveland,
M. A. Cox,
M. M. Depatie,
J. Dittmer
, et al. (108 additional authors not shown)
Abstract:
The SNO+ experiment collected data as a low-threshold water Cherenkov detector from September 2017 to July 2019. Measurements of the 2.2-MeV $γ$ produced by neutron capture on hydrogen have been made using an Am-Be calibration source, for which a large fraction of emitted neutrons are produced simultaneously with a 4.4-MeV $γ$. Analysis of the delayed coincidence between the 4.4-MeV $γ$ and the 2.…
▽ More
The SNO+ experiment collected data as a low-threshold water Cherenkov detector from September 2017 to July 2019. Measurements of the 2.2-MeV $γ$ produced by neutron capture on hydrogen have been made using an Am-Be calibration source, for which a large fraction of emitted neutrons are produced simultaneously with a 4.4-MeV $γ$. Analysis of the delayed coincidence between the 4.4-MeV $γ$ and the 2.2-MeV capture $γ$ revealed a neutron detection efficiency that is centered around 50% and varies at the level of 1% across the inner region of the detector, which to our knowledge is the highest efficiency achieved among pure water Cherenkov detectors. In addition, the neutron capture time constant was measured and converted to a thermal neutron-proton capture cross section of $336.3^{+1.2}_{-1.5}$ mb.
△ Less
Submitted 13 July, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Superconvergence of Online Optimization for Model Predictive Control
Authors:
Sen Na,
Mihai Anitescu
Abstract:
We develop a one-Newton-step-per-horizon, online, lag-$L$, model predictive control (MPC) algorithm for solving discrete-time, equality-constrained, nonlinear dynamic programs. Based on recent sensitivity analysis results for the target problems class, we prove that the approach exhibits a behavior that we call superconvergence; that is, the tracking error with respect to the full horizon solution…
▽ More
We develop a one-Newton-step-per-horizon, online, lag-$L$, model predictive control (MPC) algorithm for solving discrete-time, equality-constrained, nonlinear dynamic programs. Based on recent sensitivity analysis results for the target problems class, we prove that the approach exhibits a behavior that we call superconvergence; that is, the tracking error with respect to the full horizon solution is not only stable for successive horizon shifts, but also decreases with increasing shift order to a minimum value that decays exponentially in the length of the receding horizon. The key analytical step is the decomposition of the one-step error recursion of our algorithm into algorithmic error and perturbation error. We show that the perturbation error decays exponentially with the lag between two consecutive receding horizons, while~the algorithmic error, determined by Newton's method, achieves quadratic convergence instead. Overall this approach induces our local exponential convergence result in terms of the receding horizon length for suitable values of $L$. Numerical experiments validate our theoretical findings.
△ Less
Submitted 5 February, 2022; v1 submitted 10 January, 2020;
originally announced January 2020.
-
Exponential Decay in the Sensitivity Analysis of Nonlinear Dynamic Programming
Authors:
Sen Na,
Mihai Anitescu
Abstract:
In this paper, we study the sensitivity of discrete-time dynamic programs with nonlinear dynamics and objective to perturbations in the initial conditions and reference parameters. Under uniform controllability and boundedness assumptions for the problem data, we prove that the directional derivative of the optimal state and control at time $k$, $\boldsymbol{x}^*_{k}$ and $\boldsymbol{u}^*_{k}$, w…
▽ More
In this paper, we study the sensitivity of discrete-time dynamic programs with nonlinear dynamics and objective to perturbations in the initial conditions and reference parameters. Under uniform controllability and boundedness assumptions for the problem data, we prove that the directional derivative of the optimal state and control at time $k$, $\boldsymbol{x}^*_{k}$ and $\boldsymbol{u}^*_{k}$, with respect to the reference signal at time $i$, $\boldsymbol{d}_{i}$, will have exponential decay in terms of $|k-i|$ with a decay rate $ρ$ independent of the temporal horizon length. The key technical step is to prove that a version of the convexification approach proposed by Verschueren et al. can be applied to the KKT conditions and results in a convex quadratic program with uniformly bounded data. In turn, Riccati techniques can be further employed to obtain the sensitivity result, borne from the observation that the directional derivatives are solutions of quadratic programs with structure similar to the KKT conditions themselves. We validate our findings with numerical experiments on a small nonlinear, nonconvex, dynamic program.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
Estimating Differential Latent Variable Graphical Models with Applications to Brain Connectivity
Authors:
Sen Na,
Mladen Kolar,
Oluwasanmi Koyejo
Abstract:
Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the diffe…
▽ More
Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the differential network is decomposed into sparse and low-rank components, both of which are symmetric indefinite matrices. We estimate these two components simultaneously using a two-stage procedure: (i) an initialization stage, which computes a simple, consistent estimator, and (ii) a convergence stage, implemented using a projected alternating gradient descent algorithm applied to a nonconvex objective, initialized using the output of the first stage. We prove that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error. Experiments on synthetic and real data illustrate that the proposed nonconvex procedure outperforms existing methods.
△ Less
Submitted 13 May, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Discovery of Natural Language Concepts in Individual Units of CNNs
Authors:
Seil Na,
Yo Joong Choe,
Dong-Hyun Lee,
Gunhee Kim
Abstract:
Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that indivi…
▽ More
Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such an intriguing phenomenon, we propose a concept alignment method based on how units respond to the replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language.
△ Less
Submitted 28 February, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
MISO: Mutual Information Loss with Stochastic Style Representations for Multimodal Image-to-Image Translation
Authors:
Sanghyeon Na,
Seungjoo Yoo,
Jaegul Choo
Abstract:
Unpaired multimodal image-to-image translation is a task of translating a given image in a source domain into diverse images in the target domain, overcoming the limitation of one-to-one map**. Existing multimodal translation models are mainly based on the disentangled representations with an image reconstruction loss. We propose two approaches to improve multimodal translation quality. First, w…
▽ More
Unpaired multimodal image-to-image translation is a task of translating a given image in a source domain into diverse images in the target domain, overcoming the limitation of one-to-one map**. Existing multimodal translation models are mainly based on the disentangled representations with an image reconstruction loss. We propose two approaches to improve multimodal translation quality. First, we use a content representation from the source domain conditioned on a style representation from the target domain. Second, rather than using a typical image reconstruction loss, we design MILO (Mutual Information LOss), a new stochastically-defined loss function based on information theory. This loss function directly reflects the interpretation of latent variables as a random variable. We show that our proposed model Mutual Information with StOchastic Style Representation(MISO) achieves state-of-the-art performance through extensive experiments on various real-world datasets.
△ Less
Submitted 11 February, 2019;
originally announced February 2019.
-
Search for invisible modes of nucleon decay in water with the SNO+ detector
Authors:
SNO+ Collaboration,
:,
M. Anderson,
S. Andringa,
E. Arushanova,
S. Asahi,
M. Askins,
D. J. Auty,
A. R. Back,
Z. Barnard,
N. Barros,
D. Bartlett,
F. Barão,
R. Bayes,
E. W. Beier,
A. Bialek,
S. D. Biller,
E. Blucher,
R. Bonventre,
M. Boulay,
D. Braid,
E. Caden,
E. J. Callaghan,
J. Caravaca,
J. Carvalho
, et al. (173 additional authors not shown)
Abstract:
This paper reports results from a search for nucleon decay through 'invisible' modes, where no visible energy is directly deposited during the decay itself, during the initial water phase of SNO+. However, such decays within the oxygen nucleus would produce an excited daughter that would subsequently de-excite, often emitting detectable gamma rays. A search for such gamma rays yields limits of…
▽ More
This paper reports results from a search for nucleon decay through 'invisible' modes, where no visible energy is directly deposited during the decay itself, during the initial water phase of SNO+. However, such decays within the oxygen nucleus would produce an excited daughter that would subsequently de-excite, often emitting detectable gamma rays. A search for such gamma rays yields limits of $2.5 \times 10^{29}$ y at 90% Bayesian credibility level (with a prior uniform in rate) for the partial lifetime of the neutron, and $3.6 \times 10^{29}$ y for the partial lifetime of the proton, the latter a 70% improvement on the previous limit from SNO. We also present partial lifetime limits for invisible dinucleon modes of $1.3\times 10^{28}$ y for $nn$, $2.6\times 10^{28}$ y for $pn$ and $4.7\times 10^{28}$ y for $pp$, an improvement over existing limits by close to three orders of magnitude for the latter two.
△ Less
Submitted 13 December, 2018;
originally announced December 2018.
-
Measurement of the $^8$B Solar Neutrino Flux in SNO+ with Very Low Backgrounds
Authors:
The SNO+ Collaboration,
:,
M. Anderson,
S. Andringa,
S. Asahi,
M. Askins,
D. J. Auty,
N. Barros,
D. Bartlett,
F. Barão,
R. Bayes,
E. W. Beier,
A. Bialek,
S. D. Biller,
E. Blucher,
R. Bonventre,
M. Boulay,
E. Caden,
E. J. Callaghan,
J. Caravaca,
D. Chauhan,
M. Chen,
O. Chkvorets,
B. Cleveland,
C. Connors
, et al. (98 additional authors not shown)
Abstract:
A measurement of the $^8$B solar neutrino flux has been made using a 69.2 kt-day dataset acquired with the SNO+ detector during its water commissioning phase. At energies above 6 MeV the dataset is an extremely pure sample of solar neutrino elastic scattering events, owing primarily to the detector's deep location, allowing an accurate measurement with relatively little exposure. In that energy re…
▽ More
A measurement of the $^8$B solar neutrino flux has been made using a 69.2 kt-day dataset acquired with the SNO+ detector during its water commissioning phase. At energies above 6 MeV the dataset is an extremely pure sample of solar neutrino elastic scattering events, owing primarily to the detector's deep location, allowing an accurate measurement with relatively little exposure. In that energy region the best fit background rate is $0.25^{+0.09}_{-0.07}$ events/kt-day, significantly lower than the measured solar neutrino event rate in that energy range, which is $1.03^{+0.13}_{-0.12}$ events/kt-day. Also using data below this threshold, down to 5 MeV, fits of the solar neutrino event direction yielded an observed flux of $2.53^{+0.31}_{-0.28}$(stat.)$^{+0.13}_{-0.10}$(syst.)$\times10^6$ cm$^{-2}$s$^{-1}$, assuming no neutrino oscillations. This rate is consistent with matter enhanced neutrino oscillations and measurements from other experiments.
△ Less
Submitted 11 January, 2019; v1 submitted 8 December, 2018;
originally announced December 2018.
-
The Graph-Based Behavior-Aware Recommendation for Interactive News
Authors:
Mingyuan Ma,
Sen Na,
Hongyu Wang,
Congzhou Chen,
** Xu
Abstract:
Interactive news recommendation has been launched and attracted much attention recently. In this scenario, user's behavior evolves from single click behavior to multiple behaviors including like, comment, share etc. However, most of the existing methods still use single click behavior as the unique criterion of judging user's preferences. Further, although heterogeneous graphs have been applied in…
▽ More
Interactive news recommendation has been launched and attracted much attention recently. In this scenario, user's behavior evolves from single click behavior to multiple behaviors including like, comment, share etc. However, most of the existing methods still use single click behavior as the unique criterion of judging user's preferences. Further, although heterogeneous graphs have been applied in different areas, a proper way to construct a heterogeneous graph for interactive news data with an appropriate learning mechanism on it is still desired. To address the above concerns, we propose a graph-based behavior-aware network, which simultaneously considers six different types of behaviors as well as user's demand on the news diversity. We have three main steps. First, we build an interaction behavior graph for multi-level and multi-category data. Second, we apply DeepWalk on the behavior graph to obtain entity semantics, then build a graph-based convolutional neural network called G-CNN to learn news representations, and an attention-based LSTM to learn behavior sequence representations. Third, we introduce core and coritivity features for the behavior graph, which measure the concentration degree of user's interests. These features affect the trade-off between accuracy and diversity of our personalized recommendation system. Taking these features into account, our system finally achieves recommending news to different users at their different levels of concentration degrees.
△ Less
Submitted 20 May, 2021; v1 submitted 30 November, 2018;
originally announced December 2018.
-
High-dimensional Index Volatility Models via Stein's Identity
Authors:
Sen Na,
Mladen Kolar
Abstract:
We study the estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein's identities, we develop methods that are applicable for the estimation of the variance index in the high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in the low-di…
▽ More
We study the estimation of the parametric components of single and multiple index volatility models. Using the first- and second-order Stein's identities, we develop methods that are applicable for the estimation of the variance index in the high-dimensional setting requiring finite moment condition, which allows for heavy-tailed data. Our approach complements the existing literature in the low-dimensional setting, while relaxing the conditions on estimation, and provides a novel approach in the high-dimensional setting. We prove that the statistical rate of convergence of our variance index estimators consists of a parametric rate and a nonparametric rate, where the latter appears from the estimation of the mean link function. However, under standard assumptions, the parametric rate dominates the rate of convergence and our results match the minimax optimal rate for the mean index estimation. Simulation results illustrate finite sample properties of our methodology and back our theoretical conclusions.
△ Less
Submitted 25 May, 2020; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Measurement of the average shape of longitudinal profiles of cosmic-ray air showers at the Pierre Auger Observatory
Authors:
The Pierre Auger Collaboration,
A. Aab,
P. Abreu,
M. Aglietta,
I. F. M. Albuquerque,
J. M. Albury,
I. Allekotte,
A. Almela,
J. Alvarez Castillo,
J. Alvar ez-Muñiz,
G. A. Anastasi,
L. Anchordoqui,
B. Andrada,
S. Andringa,
C. Aramo,
H. Asorey,
P. Assis,
G. Avila,
A. M. Badescu,
A. Bakalova,
A. Balaceanu,
F. Barbato,
R. J. Barreira Luz,
S. Baur,
K. H. Becker
, et al. (363 additional authors not shown)
Abstract:
The profile of the longitudinal development of showers produced by ultra-high energy cosmic rays carries information related to the interaction properties of the primary particles with atmospheric nuclei. In this work, we present the first measurement of the average shower profile in traversed atmospheric depth at the Pierre Auger Observatory. The shapes of profiles are well reproduced by the Gais…
▽ More
The profile of the longitudinal development of showers produced by ultra-high energy cosmic rays carries information related to the interaction properties of the primary particles with atmospheric nuclei. In this work, we present the first measurement of the average shower profile in traversed atmospheric depth at the Pierre Auger Observatory. The shapes of profiles are well reproduced by the Gaisser-Hillas parametrization within the range studied, for E > 10^{17.8} eV. A detailed analysis of the systematic uncertainties is performed using 10 years of data and a full detector simulation. The average shape is quantified using two variables related to the width and asymmetry of the profile, and the results are compared with predictions of hadronic interaction models for different primary particles.
△ Less
Submitted 16 May, 2019; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Inception-Residual Block based Neural Network for Thermal Image Denoising
Authors:
Seongmin Hwang,
Gwanghyun Yu,
Huy Toan Nguyen,
Nazeer Shahid,
Doseong Sin,
**young Kim,
Seungyou Na
Abstract:
Thermal cameras show noisy images due to their limited thermal resolution, especially for the scenes of a low temperature difference. In order to deal with a noise problem, this paper proposes a novel neural network architecture with repeatable denoising inception-residual blocks(DnIRB) for noise learning. Each DnIRB has two sub-blocks with difference receptive fields and one shortcut connection t…
▽ More
Thermal cameras show noisy images due to their limited thermal resolution, especially for the scenes of a low temperature difference. In order to deal with a noise problem, this paper proposes a novel neural network architecture with repeatable denoising inception-residual blocks(DnIRB) for noise learning. Each DnIRB has two sub-blocks with difference receptive fields and one shortcut connection to prevent a vanishing gradient problem. The proposed approach is tested for thermal images. The experimental results indicate that the proposed approach shows the best SQNR performance and reasonable processing time compared with state-of-the-art denoising methods.
△ Less
Submitted 19 November, 2018; v1 submitted 31 October, 2018;
originally announced October 2018.