-
FastImpute: A Baseline for Open-source, Reference-Free Genotype Imputation Methods -- A Case Study in PRS313
Authors:
Aaron Ge,
Jeya Balasubramanian,
Xueyao Wu,
Peter Kraft,
Jonas S. Almeida
Abstract:
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alte…
▽ More
Genotype imputation enhances genetic data by predicting missing SNPs using reference haplotype information. Traditional methods leverage linkage disequilibrium (LD) to infer untyped SNP genotypes, relying on the similarity of LD structures between genotyped target sets and fully sequenced reference panels. Recently, reference-free deep learning-based methods have emerged, offering a promising alternative by predicting missing genotypes without external databases, thereby enhancing privacy and accessibility. However, these methods often produce models with tens of millions of parameters, leading to challenges such as the need for substantial computational resources to train and inefficiency for client-sided deployment. Our study addresses these limitations by introducing a baseline for a novel genotype imputation pipeline that supports client-sided imputation models generalizable across any genoty** chip and genomic region. This approach enhances patient privacy by performing imputation directly on edge devices. As a case study, we focus on PRS313, a polygenic risk score comprising 313 SNPs used for breast cancer risk prediction. Utilizing consumer genetic panels such as 23andMe, our model democratizes access to personalized genetic insights by allowing 23andMe users to obtain their PRS313 score. We demonstrate that simple linear regression can significantly improve the accuracy of PRS313 scores when calculated using SNPs imputed from consumer gene panels, such as 23andMe. Our linear regression model achieved an R^2 of 0.86, compared to 0.33 without imputation and 0.28 with simple imputation (substituting missing SNPs with the minor allele frequency). These findings suggest that popular SNP analysis libraries could benefit from integrating linear regression models for genotype imputation, providing a viable and light-weight alternative to reference based imputation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
KeldyshQFT: A C++ codebase for real-frequency multiloop functional renormalization group and parquet computations of the single-impurity Anderson model
Authors:
Nepomuk Ritz,
Anxiang Ge,
Elias Walter,
Santiago Aguirre,
Jan von Delft,
Fabian B. Kugler
Abstract:
We provide a detailed exposition of our computational framework designed for the accurate calculation of real-frequency dynamical correlation functions of the single-impurity Anderson model (AM) in the regime of weak to intermediate coupling. Using quantum field theory within the Keldysh formalism to directly access the self-energy and dynamical susceptibilities in real frequencies, as detailed in…
▽ More
We provide a detailed exposition of our computational framework designed for the accurate calculation of real-frequency dynamical correlation functions of the single-impurity Anderson model (AM) in the regime of weak to intermediate coupling. Using quantum field theory within the Keldysh formalism to directly access the self-energy and dynamical susceptibilities in real frequencies, as detailed in our recent publication (https://doi.org/10.1103/PhysRevB.109.115128), the primary computational challenge is the full three-dimensional real-frequency dependence of the four-point vertex. Our codebase provides a fully MPI+OpenMP parallelized implementation of the functional renormalization group (fRG) and the self-consistent parquet equations within the parquet approximation. It leverages vectorization to handle the additional complexity imposed by the Keldysh formalism, using optimized data structures and highly performant integration routines. Going beyond the results shown in the previous publication, the code includes functionality to perform fRG calculations in the multiloop framework, at arbitrary loop order, including self-consistent self-energy iterations. Moreover, implementations of various regulators, such as hybridization, interaction, frequency, and temperature are supplied.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Fibonometry and Beyond
Authors:
Nikhil Byrapuram,
Adam Ge,
Selena Ge,
Tanya Khovanova,
Sylvia Zia Lee,
Rajarshi Mandal,
Gordon Redwine,
Soham Samanta,
Daniel Wu,
Danyang Xu,
Ray Zhao
Abstract:
In 2013, Conway and Ryba wrote a fascinating paper called Fibonometry. The paper, as one might guess, is about the connection between Fibonacci numbers and trigonometry. We were fascinated by this paper and looked at how we could generalize it. We discovered that we weren't the first. In this paper, we describe our journey and summarize the results.
In 2013, Conway and Ryba wrote a fascinating paper called Fibonometry. The paper, as one might guess, is about the connection between Fibonacci numbers and trigonometry. We were fascinated by this paper and looked at how we could generalize it. We discovered that we weren't the first. In this paper, we describe our journey and summarize the results.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Two-stage Progressive Residual Dense Attention Network for Image Denoising
Authors:
Wencong Wu,
An Ge,
Guannan Lv,
Yuelong Xia,
Yungang Zhang,
Wen Xiong
Abstract:
Deep convolutional neural networks (CNNs) for image denoising can effectively exploit rich hierarchical features and have achieved great success. However, many deep CNN-based denoising models equally utilize the hierarchical features of noisy images without paying attention to the more important and useful features, leading to relatively low performance. To address the issue, we design a new Two-s…
▽ More
Deep convolutional neural networks (CNNs) for image denoising can effectively exploit rich hierarchical features and have achieved great success. However, many deep CNN-based denoising models equally utilize the hierarchical features of noisy images without paying attention to the more important and useful features, leading to relatively low performance. To address the issue, we design a new Two-stage Progressive Residual Dense Attention Network (TSP-RDANet) for image denoising, which divides the whole process of denoising into two sub-tasks to remove noise progressively. Two different attention mechanism-based denoising networks are designed for the two sequential sub-tasks: the residual dense attention module (RDAM) is designed for the first stage, and the hybrid dilated residual dense attention module (HDRDAM) is proposed for the second stage. The proposed attention modules are able to learn appropriate local features through dense connection between different convolutional layers, and the irrelevant features can also be suppressed. The two sub-networks are then connected by a long skip connection to retain the shallow feature to enhance the denoising performance. The experiments on seven benchmark datasets have verified that compared with many state-of-the-art methods, the proposed TSP-RDANet can obtain favorable results both on synthetic and real noisy image denoising. The code of our TSP-RDANet is available at https://github.com/WenCongWu/TSP-RDANet.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Analytic continuation of multipoint correlation functions
Authors:
Anxiang Ge,
Johannes Halbinger,
Seung-Sup B. Lee,
Jan von Delft,
Fabian B. Kugler
Abstract:
Conceptually, the Matsubara formalism (MF), using imaginary frequencies, and the Keldysh formalism (KF), formulated in real frequencies, give equivalent results for systems in thermal equilibrium. The MF has less complexity and is thus more convenient than the KF. However, computing dynamical observables in the MF requires the analytic continuation from imaginary to real frequencies. The analytic…
▽ More
Conceptually, the Matsubara formalism (MF), using imaginary frequencies, and the Keldysh formalism (KF), formulated in real frequencies, give equivalent results for systems in thermal equilibrium. The MF has less complexity and is thus more convenient than the KF. However, computing dynamical observables in the MF requires the analytic continuation from imaginary to real frequencies. The analytic continuation is well-known for two-point correlation functions (having one frequency argument), but, for multipoint correlators, a straightforward recipe for deducing all Keldysh components from the MF correlator had not been formulated yet. Recently, a representation of MF and KF correlators in terms of formalism-independent partial spectral functions and formalism-specific kernels was introduced by Kugler, Lee, and von Delft [Phys. Rev. X 11, 041006 (2021)]. We use this representation to formally elucidate the connection between both formalisms. We show how a multipoint MF correlator can be analytically continued to recover all partial spectral functions and yield all Keldysh components of its KF counterpart. The procedure is illustrated for various correlators of the Hubbard atom.
△ Less
Submitted 6 May, 2024; v1 submitted 19 November, 2023;
originally announced November 2023.
-
Maximum Number of Quads
Authors:
Nikhil Byrapuram,
Hwiseo,
Choi,
Adam Ge,
Selena Ge,
Tanya Khovanova,
Sylvia Zia Lee,
Evin Liang,
Rajarshi Mandal,
Aika Oki,
Daniel Wu,
Michael Yang
Abstract:
We study the maximum number of quads among $\ell$ cards from an EvenQuads deck of size $2^n$. This corresponds to enumerating quadruples of integers in the range $[0,\ell-1]$ such that their bitwise XOR is zero. In this paper, we conjecture a formula that calculates the maximum number of quads among $\ell$ cards.
We study the maximum number of quads among $\ell$ cards from an EvenQuads deck of size $2^n$. This corresponds to enumerating quadruples of integers in the range $[0,\ell-1]$ such that their bitwise XOR is zero. In this paper, we conjecture a formula that calculates the maximum number of quads among $\ell$ cards.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
EvenQuads Game and Error-Correcting Codes
Authors:
Nikhil Byrapuram,
Hwiseo,
Choi,
Adam Ge,
Selena Ge,
Tanya Khovanova,
Sylvia Zia Lee,
Evin Liang,
Rajarshi Mandal,
Aika Oki,
Daniel Wu,
Michael Yang
Abstract:
EvenQuads is a new card game that is a generalization of the SET game, where each card is characterized by three attributes, each taking four possible values. Four cards form a quad when, for each attribute, the values are the same, all different, or half and half. Given $\ell$ cards from the deck of EvenQuads, we can build an error-correcting linear binary code of length $\ell$ and Hamming distan…
▽ More
EvenQuads is a new card game that is a generalization of the SET game, where each card is characterized by three attributes, each taking four possible values. Four cards form a quad when, for each attribute, the values are the same, all different, or half and half. Given $\ell$ cards from the deck of EvenQuads, we can build an error-correcting linear binary code of length $\ell$ and Hamming distance 4. The quads correspond to codewords of weight 4. Error-correcting codes help us calculate the possible number of quads when given up to 8 cards. We also estimate the number of cards that do not contain quads for decks of different sizes. In addition, we discuss properties of error-correcting codes built on semimagic, magic, and strongly magic quad squares.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
MatsubaraFunctions.jl: An equilibrium Green's function library in the Julia programming language
Authors:
Dominik Kiese,
Anxiang Ge,
Nepomuk Ritz,
Jan von Delft,
Nils Wentzell
Abstract:
The Matsubara Green's function formalism stands as a powerful technique for computing the thermodynamic characteristics of interacting quantum many-particle systems at finite temperatures. In this manuscript, our focus centers on introducing MatsubaraFunctions.jl, a Julia library that implements data structures for generalized n-point Green's functions on Matsubara frequency grids. The package's a…
▽ More
The Matsubara Green's function formalism stands as a powerful technique for computing the thermodynamic characteristics of interacting quantum many-particle systems at finite temperatures. In this manuscript, our focus centers on introducing MatsubaraFunctions.jl, a Julia library that implements data structures for generalized n-point Green's functions on Matsubara frequency grids. The package's architecture prioritizes user-friendliness without compromising the development of efficient solvers for quantum field theories in equilibrium. Following a comprehensive introduction of the fundamental types, we delve into a thorough examination of key facets of the interface. This encompasses avenues for accessing Green's functions, techniques for extrapolation and interpolation, as well as the incorporation of symmetries and a variety of parallelization strategies. Examples of increasing complexity serve to demonstrate the practical utility of the library, supplemented by discussions on strategies for sidestep** impediments to optimal performance.
△ Less
Submitted 28 November, 2023; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Quad Squares
Authors:
Nikhil Byrapuram,
Hwiseo,
Choi,
Adam Ge,
Selena Ge,
Tanya Khovanova,
Sylvia Zia Lee,
Evin Liang,
Rajarshi Mandal,
Aika Oki,
Daniel Wu,
Michael Yang
Abstract:
We study 4-by-4 squares formed by cards from the EvenQuads deck. EvenQuads is a card game with 64 cards where cards have 3 attributes with 4 values in each attribute. A quad is four cards with all attributes the same, all different, or half and half. We define Latin quad squares as squares where the cards in each row and column have different values for each attribute. We define semimagic quad squ…
▽ More
We study 4-by-4 squares formed by cards from the EvenQuads deck. EvenQuads is a card game with 64 cards where cards have 3 attributes with 4 values in each attribute. A quad is four cards with all attributes the same, all different, or half and half. We define Latin quad squares as squares where the cards in each row and column have different values for each attribute. We define semimagic quad squares as squares where each row and column form a quad. For magic quad squares, we add a requirement that the diagonals have to form a quad. We also define strongly magic quad squares. We analyze types of semimagic and strongly magic quad squares. We also calculate the number of semimagic, magic, and strongly magic quad squares for quad decks of any size. These squares can be described in terms of integers. Four integers form a quad when their bitwise XOR is zero.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
mSigSDK -- private, at scale, computation of mutation signatures
Authors:
Aaron Ge,
Yasmmin CĂ´rtes Martins,
Tongwu Zhang,
Kailing Chen,
Maria Teresa Landi,
Brian Park,
Jeya Balasubramanian,
Jonas S Almeida
Abstract:
In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graph…
▽ More
In our previous work, we demonstrated that it is feasible to perform analysis on mutation signature data without the need for downloads or installations and analyze individual patient data at scale without compromising privacy. Building on this foundation, we developed a Software Development Kit (SDK) called mSigSDK to facilitate the orchestration of distributed data processing workflows and graphic visualization of mutational signature analysis results. We strictly adhered to modern web computing standards, particularly the modularization standards set by the ECMAScript ES6 framework (JavaScript modules). Our approach allows for computation to be entirely performed by secure delegation to the computational resources of the user's own machine (in-browser), without any downloads or installations. The mSigSDK was developed primarily as a companion library to the mSig Portal resource of the National Cancer Institute Division of Cancer Epidemiology and Genetics (NIH/NCI/DCEG), with a focus on its FAIR extensibility as components of other researchers' computational constructs. Anticipated extensions include the programmatic operation of other mutation signature API ecosystems such as SIGNAL and COSMIC, advancing towards a data commons for mutational signature research (Grossman et al., 2016).
△ Less
Submitted 19 January, 2024; v1 submitted 5 August, 2023;
originally announced August 2023.
-
Real-frequency quantum field theory applied to the single-impurity Anderson model
Authors:
Anxiang Ge,
Nepomuk Ritz,
Elias Walter,
Santiago Aguirre,
Jan von Delft,
Fabian B. Kugler
Abstract:
A major challenge in the field of correlated electrons is the computation of dynamical correlation functions. For comparisons with experiment, one is interested in their real-frequency dependence. This is difficult to compute, as imaginary-frequency data from the Matsubara formalism require analytic continuation, a numerically ill-posed problem. Here, we apply quantum field theory to the single-im…
▽ More
A major challenge in the field of correlated electrons is the computation of dynamical correlation functions. For comparisons with experiment, one is interested in their real-frequency dependence. This is difficult to compute, as imaginary-frequency data from the Matsubara formalism require analytic continuation, a numerically ill-posed problem. Here, we apply quantum field theory to the single-impurity Anderson model (AM), using the Keldysh instead of the Matsubara formalism with direct access to the self-energy and dynamical susceptibilities on the real-frequency axis. We present results from the functional renormalization group (fRG) at one-loop level and from solving the self-consistent parquet equations in the parquet approximation. In contrast to previous Keldysh fRG works, we employ a parametrization of the four-point vertex which captures its full dependence on three real-frequency arguments. We compare our results to benchmark data obtained with the numerical renormalization group and to second-order perturbation theory. We find that capturing the full frequency dependence of the four-point vertex significantly improves the fRG results compared to previous implementations, and that solving the parquet equations yields the best agreement with the NRG benchmark data, but is only feasible up to moderate interaction strengths. Our methodical advances pave the way for treating more complicated models in the future.
△ Less
Submitted 8 April, 2024; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Card Games Unveiled: Exploring the Underlying Linear Algebra
Authors:
Nikhil Byrapuram,
Hwiseo,
Choi,
Adam Ge,
Selena Ge,
Tanya Khovanova,
Sylvia Zia Lee,
Evin Liang,
Rajarshi Mandal,
Aika Oki,
Daniel Wu,
Michael Yang
Abstract:
We discuss four famous card games that can help learn linear algebra. The games are: SET, Socks, Spot it!, and EvenQuads. We describe the game in the language of vector, affine, and projective spaces. We also show how these games are connected to each other. A separate section is devoted to playing Socks with the EvenQuads deck and vice versa.
We discuss four famous card games that can help learn linear algebra. The games are: SET, Socks, Spot it!, and EvenQuads. We describe the game in the language of vector, affine, and projective spaces. We also show how these games are connected to each other. A separate section is devoted to playing Socks with the EvenQuads deck and vice versa.
△ Less
Submitted 1 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
A FAIR platform for reproducing mutational signature detection on tumor sequencing data
Authors:
Aaron Ge,
Tongwu Zhang,
Clara Bodelon,
Montserrat Garcia-Closas,
Jonas Almeida,
Jeya Balasubramanian
Abstract:
This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. O…
▽ More
This paper presents a portable, privacy-preserving, in-browser platform for the reproducible assessment of mutational signature detection methods from sparse sequencing data generated by targeted gene panels. The platform aims to address the reproducibility challenges in mutational signature research by adhering to the FAIR principles, making it findable, accessible, interoperable, and reusable. Our approach focuses on the detection of specific mutational signatures, such as SBS3, which have been linked to specific mutagenic processes. The platform relies on publicly available data, simulation, downsampling techniques, and machine learning algorithms to generate training data and labels and to train and evaluate models. The key achievement of our platform is its transparency, reusability, and privacy preservation, enabling researchers and clinicians to analyze mutational signatures with the guarantee that no data circulates outside the client machine.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Generalizing the Wythoff Array and other Fibonacci Facts to Tribonacci Numbers
Authors:
Eric Chen,
Adam Ge,
Andrew Kalashnikov,
Tanya Khovanova,
Ella Kim,
Evin Liang,
Mira Lubashev,
Matthew Qian,
Rohith Raghavan,
Benjamin Taycher,
Samuel Wang
Abstract:
In this paper, we generalize a lot of facts from John Conway and Alex Ryba's paper, \textit{The extra Fibonacci series and the Empire State Building}, where we replace the Fibonacci sequence with the Tribonacci sequence. We study the Tribonacci array, which we also call \textit{the Trithoff array} to emphasize the connection to the Wythoff array. We describe 13 new sequences.
In this paper, we generalize a lot of facts from John Conway and Alex Ryba's paper, \textit{The extra Fibonacci series and the Empire State Building}, where we replace the Fibonacci sequence with the Tribonacci sequence. We study the Tribonacci array, which we also call \textit{the Trithoff array} to emphasize the connection to the Wythoff array. We describe 13 new sequences.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Multiloop flow equations for single-boson exchange fRG
Authors:
Marcel Gievers,
Elias Walter,
Anxiang Ge,
Jan von Delft,
Fabian B. Kugler
Abstract:
The recently introduced single-boson exchange (SBE) decomposition of the four-point vertex of interacting fermionic many-body systems is a conceptually and computationally appealing parametrization of the vertex. It relies on the notion of reducibility of vertex diagrams with respect to the bare interaction $U$, instead of a classification based on two-particle reducibility within the widely-used…
▽ More
The recently introduced single-boson exchange (SBE) decomposition of the four-point vertex of interacting fermionic many-body systems is a conceptually and computationally appealing parametrization of the vertex. It relies on the notion of reducibility of vertex diagrams with respect to the bare interaction $U$, instead of a classification based on two-particle reducibility within the widely-used parquet decomposition. Here, we re-derive the SBE decomposition in a generalized framework (suitable for extensions to, e.g., inhomogeneous systems or real-frequency treatments) following from the parquet equations. We then derive multiloop functional renormalization group (mfRG) flow equations for the ingredients of this SBE decomposition, both in the parquet approximation, where the fully two-particle irreducible vertex is treated as an input, and in the more restrictive SBE approximation, where this role is taken by the fully $U$-irreducible vertex. Moreover, we give mfRG flow equations for the popular parametrization of the vertex in terms of asymptotic classes of the two-particle reducible vertices. Since the parquet and SBE decompositions are closely related, their mfRG flow equations are very similar in structure.
△ Less
Submitted 7 July, 2022; v1 submitted 13 January, 2022;
originally announced January 2022.
-
Spectral Element Method for the Elastic/Acoustic Waveguide Problem in Anisotropic Metamaterials
Authors:
An Qi Ge,
Ming Wei Zhuang,
Jie Liu,
Qing Huo Liu
Abstract:
In order to simulate elastic wave propagation in a complex structure with inhomogeneous media, we often need to obtain the propagating eigenmodes of an elastic waveguide. As the waveguide is assumed uniform in one direction, the original 3-D problem can be converted into a so-called 2.5-D problem by using the Fourier transform in that direction. However, the introduction of elastic metamaterials (…
▽ More
In order to simulate elastic wave propagation in a complex structure with inhomogeneous media, we often need to obtain the propagating eigenmodes of an elastic waveguide. As the waveguide is assumed uniform in one direction, the original 3-D problem can be converted into a so-called 2.5-D problem by using the Fourier transform in that direction. However, the introduction of elastic metamaterials (EMM) broadens the horizon of this subject, and new features are required in EMM waveguides that cannot be obtained by most traditional waveguide solvers. In this work, a spectral element method (SEM) is developed to simulate the elastic/acoustic waveguide problem in anisotropic media with anisotropic mass density and/or negative index parameters. To the best of our knowledge, the SEM has not been introduced previously for such a waveguide problem. For waveguides with anisotropic density that cannot be solved by the FEM in most of commercial software packages, we design an anisotropic density EMM waveguide with our SEM solver to demonstrate some intriguing phenomena. The spectral element results are verified by several numerical examples through comparison with the traditional finite element method (FEM) to show its significant advantages in term of accuracy and computation efficiency.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Cleaning Noisy and Heterogeneous Metadata for Record Linking Across Scholarly Big Datasets
Authors:
Athar Sefid,
Jian Wu,
Allen C. Ge,
**g Zhao,
Lu Liu,
Cornelia Caragea,
Prasenjit Mitra,
C. Lee Giles
Abstract:
Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarit…
▽ More
Automatically extracted metadata from scholarly documents in PDF formats is usually noisy and heterogeneous, often containing incomplete fields and erroneous values. One common way of cleaning metadata is to use a bibliographic reference dataset. The challenge is to match records between corpora with high precision. The existing solution which is based on information retrieval and string similarity on titles works well only if the titles are cleaned. We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset. The blocking function uses the classic BM25 algorithm to find the matching candidates from the reference data that has been indexed by ElasticSearch. The core components use supervised methods which combine features extracted from all available metadata fields. The system also leverages available citation information to match entities. The combination of metadata and citation achieves high accuracy that significantly outperforms the baseline method on the same test dataset. We apply this system to match the database of CiteSeerX against Web of Science, PubMed, and DBLP. This method will be deployed in the CiteSeerX system to clean metadata and link records to other scholarly big datasets.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.