-
Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models
Authors:
Hyuna Kwon,
Tim Hsu,
Wenyu Sun,
Wonseok Jeong,
Fikret Aydin,
James Chapman,
Xiao Chen,
Matthew R. Carbone,
Deyu Lu,
Fei Zhou,
Tuan Anh Pham
Abstract:
The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of…
▽ More
The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of amorphous carbons ($a$-C) as a representative material system from the target X-ray absorption near edge structure (XANES) spectra--a common experimental technique to probe atomic structures of materials. We show that conditional generation guided by XANES spectra reproduces key features of the target structures. Furthermore, we show that our model can steer the generative process to tailor atomic arrangements for a specific XANES spectrum. Finally, our generative model exhibits a remarkable scale-agnostic property, thereby enabling generation of realistic, large-scale structures through learning from a small-scale dataset (i.e., with small unit cells). Our work represents a significant stride in bridging the gap between materials characterization and atomic structure determination; in addition, it can be leveraged for materials discovery in exploring various material properties as targeted.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
Stratified-NMF for Heterogeneous Data
Authors:
James Chapman,
Yotam Yaniv,
Deanna Needell
Abstract:
Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent st…
▽ More
Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
Authors:
James Chapman,
Lennie Wells,
Ana Lawry Aguila
Abstract:
The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promi…
▽ More
The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights.
△ Less
Submitted 1 May, 2024; v1 submitted 2 October, 2023;
originally announced October 2023.
-
Algebraic Reasoning About Timeliness
Authors:
Seyed Hossein Haeri,
Peter W. Thompson,
Peter Van Roy,
Magne Haveraaen,
Neil J. Davies,
Mikhail Barash,
Kevin Hammond,
James Chapman
Abstract:
Designing distributed systems to have predictable performance under high load is difficult because of resource exhaustion, non-linearity, and stochastic behaviour. Timeliness, i.e., delivering results within defined time bounds, is a central aspect of predictable performance. In this paper, we focus on timeliness using the DELTA-Q Systems Development paradigm (DELTA-QSD, developed by PNSol), which…
▽ More
Designing distributed systems to have predictable performance under high load is difficult because of resource exhaustion, non-linearity, and stochastic behaviour. Timeliness, i.e., delivering results within defined time bounds, is a central aspect of predictable performance. In this paper, we focus on timeliness using the DELTA-Q Systems Development paradigm (DELTA-QSD, developed by PNSol), which computes timeliness by modelling systems observationally using so-called outcome expressions. An outcome expression is a compositional definition of a system's observed behaviour in terms of its basic operations. Given the behaviour of the basic operations, DELTA-QSD efficiently computes the stochastic behaviour of the whole system including its timeliness.
This paper formally proves useful algebraic properties of outcome expressions w.r.t. timeliness. We prove the different algebraic structures the set of outcome expressions form with the different DELTA-QSD operators and demonstrate why those operators do not form richer structures. We prove or disprove the set of all possible distributivity results on outcome expressions. On our way for disproving 8 of those distributivity results, we develop a technique called properisation, which gives rise to the first body of maths for improper random variables. Finally, we also prove 14 equivalences that have been used in the past in the practice of DELTA-QSD.
An immediate benefit is rewrite rules that can be used for design exploration under established timeliness equivalence. This work is part of an ongoing project to disseminate and build tool support for DELTA-QSD. The ability to rewrite outcome expressions is essential for efficient tool support.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets
Authors:
James Chapman,
Bohan Chen,
Zheng Tan,
Jeff Calder,
Kevin Miller,
Andrea L. Bertozzi
Abstract:
Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning s…
▽ More
Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
CNN-based fully automatic mitral valve extraction using CT images and existence probability maps
Authors:
Yukiteru Masuda,
Ryo Ishikawa,
Toru Tanaka,
Gakuto Aoyama,
Keitaro Kawashima,
James V. Chapman,
Masahiko Asami,
Michael Huy Cuong Pham,
Klaus Fuglsang Kofoed,
Takuya Sakaguchi,
Kiyohide Satoh
Abstract:
Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitra…
▽ More
Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitral valve shape from computed tomography (CT) images for the all phases of the cardiac cycle. This method extracts the mitral valve shape based on DenseNet using both the original CT image and the existence probability maps of the mitral valve area inferred by U-Net as input. A total of 1585 CT images from 204 patients with various cardiac diseases including mitral regurgitation (MR) were collected and manually annotated for mitral valve region. The proposed method was trained and evaluated by 10-fold cross validation using the collected data and was compared with the method without the existence probability maps. The mean error of shape extraction error in the proposed method is 0.88 mm, which is an improvement of 0.32 mm compared with the method without the existence probability maps.
△ Less
Submitted 18 May, 2023; v1 submitted 30 April, 2023;
originally announced May 2023.
-
Multi-modal Variational Autoencoders for normative modelling across multiple imaging modalities
Authors:
Ana Lawry Aguila,
James Chapman,
Andre Altmann
Abstract:
One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to dise…
▽ More
One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to disease pathology. For many heterogeneous diseases, we expect to observe abnormalities across a range of neuroimaging and biological variables. However, thus far, normative models have largely been developed for studying a single imaging modality. We aim to develop a multi-modal normative modelling framework where abnormality is aggregated across variables of multiple modalities and is better able to detect deviations than uni-modal baselines. We propose two multi-modal VAE normative models to detect subject level deviations across T1 and DTI data. Our proposed models were better able to detect diseased individuals, capture disease severity, and correlate with patient cognition than baseline approaches. We also propose a multivariate latent deviation metric, measuring deviations from the joint latent space, which outperformed feature-based metrics.
△ Less
Submitted 2 October, 2023; v1 submitted 16 March, 2023;
originally announced March 2023.
-
Score-based denoising for atomic structure identification
Authors:
Tim Hsu,
Babak Sadigh,
Nicolas Bertin,
Cheol Woo Park,
James Chapman,
Vasily Bulatov,
Fei Zhou
Abstract:
We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly revea…
▽ More
We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly reveal underlying crystal order while retaining disorder associated with crystal defects. Purely geometric, agnostic to interatomic potentials, and trained without inputs from explicit simulations, our denoiser can be applied to simulation data generated from vastly different interatomic interactions. The denoiser is shown to improve existing classification methods such as common neighbor analysis and polyhedral template matching, reaching perfect classification accuracy on a recent benchmark dataset of thermally perturbed structures up to the melting point. Demonstrated here in a wide variety of atomistic simulation contexts, the denoiser is general, robust, and readily extendable to delineate order from disorder in structurally and chemically complex materials.
△ Less
Submitted 3 May, 2023; v1 submitted 5 December, 2022;
originally announced December 2022.
-
A Generalized EigenGame with Extensions to Multiview Representation Learning
Authors:
James Chapman,
Ana Lawry Aguila,
Lennie Wells
Abstract:
Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning e…
▽ More
Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning extensions of CCA require large mini-batch sizes, and therefore large memory consumption, in the stochastic setting to achieve good performance and this has limited its application in practice. Inspired by the Generalized Hebbian Algorithm, we develop an approach to solving stochastic GEPs in which all constraints are softly enforced by Lagrange multipliers. Then by considering the integral of this Lagrangian function, its pseudo-utility, and inspired by recent formulations of Principal Components Analysis and GEPs as games with differentiable utilities, we develop a game-theory inspired approach to solving GEPs. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case but our method permits extension to general function approximators like neural networks for certain GEPs for dimensionality reduction including CCA which means our method can be used for deep multiview representation learning. We demonstrate the effectiveness of our method for solving GEPs in the stochastic setting using canonical multiview datasets and demonstrate state-of-the-art performance for optimizing Deep CCA.
△ Less
Submitted 9 January, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Latency Reduction for Mobile Backhaul by Pipelining LTE and DOCSIS
Authors:
Jennifer Andreoli-Fang,
John T Chapman
Abstract:
The small cell market has been growing. To backhaul wireless traffic from small cells, the mobile network operators (MNOs) are looking into economically viable solutions, specifically the hybrid fiber coaxial networks (HFC), in addition to the traditional choice of fiber. When the latencies from both the wireless and the HFC networks are added together, it can result in noticeable end-to-end syste…
▽ More
The small cell market has been growing. To backhaul wireless traffic from small cells, the mobile network operators (MNOs) are looking into economically viable solutions, specifically the hybrid fiber coaxial networks (HFC), in addition to the traditional choice of fiber. When the latencies from both the wireless and the HFC networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. If the two networks could somehow coordinate with each other, it would be possible to decrease the total system latency and increase system performance. In this paper, we propose a method to improve upstream user-to-mobile core latency by coordinating the LTE and HFC scheduling. The method reduces the impact on system latency from the HFC network's request-grant-data loop, which is the main contributor of backhaul upstream latency. Through simulation, we show that coordinated scheduling improves overall system latency.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Low Latency Techniques for Mobile Backhaul over DOCSIS
Authors:
John T Chapman,
Jennifer Andreoli-Fang,
Michel Chauvin,
Elias Chavarria Reyes,
Zheng Lu,
Dantong Liu,
Joey Padden,
Alon Bernstein
Abstract:
The mobile network operators (MNOs) are looking into economically viable backhaul solutions as alternatives to fiber, specifically the hybrid fiber coaxial networks (HFC). When the latencies from both the wireless and the HFC networks are added together, the result is a noticeable end-to-end system latency, particularly under network congestion. In order to decrease total system latency, we propos…
▽ More
The mobile network operators (MNOs) are looking into economically viable backhaul solutions as alternatives to fiber, specifically the hybrid fiber coaxial networks (HFC). When the latencies from both the wireless and the HFC networks are added together, the result is a noticeable end-to-end system latency, particularly under network congestion. In order to decrease total system latency, we proposed a method to improve upstream user- to-mobile core latency by coordinating the LTE and HFC scheduling in previous papers. In this paper, we implement and optimize the proposed method on a custom LTE and DOCSIS end-to-end system testbed. The testbed uses the OpenAirInterface (OAI) platform for the LTE network, along with Cisco's broadband router cBR-8 that is currently deployed in the HFC networks around the world. Our results show a backhaul latency improvement under all traffic load conditions.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Mobile-Aware Scheduling for Low Latency Backhaul over DOCSIS
Authors:
Jennifer Andreoli-Fang,
John T Chapman
Abstract:
In this paper, we discuss latency reduction techniques for mobile backhaul over Data Over Cable Service Interface Specifications (DOCSIS) networks. When the latencies from both the wireless and the DOCSIS networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. Previously, we proposed a method to improve upstream user-to-mobile core…
▽ More
In this paper, we discuss latency reduction techniques for mobile backhaul over Data Over Cable Service Interface Specifications (DOCSIS) networks. When the latencies from both the wireless and the DOCSIS networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. Previously, we proposed a method to improve upstream user-to-mobile core latency by coordinating the LTE and DOCSIS scheduling. The method reduces the impact on system latency from the DOCSIS network's request-grant-data loop, which is the main contributor of backhaul upstream latency. Since the method reduces latency on the DOCSIS data path, it will therefore improve performance of latency sensitive applications, particularly if TCP is used as the transport protocol, especially when the link is congested. In this paper, we investigate the effect of HARQ failure on system performance. Through simulation, we show that despite the uncertainty introduced by the LTE protocol, coordinated scheduling improves overall system latency.
△ Less
Submitted 16 November, 2022; v1 submitted 15 November, 2022;
originally announced November 2022.
-
Macroeconomic Predictions using Payments Data and Machine Learning
Authors:
James T. E. Chapman,
Ajit Desai
Abstract:
Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine lear…
▽ More
Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Efficient, Interpretable Graph Neural Network Representation for Angle-dependent Properties and its Application to Optical Spectroscopy
Authors:
Tim Hsu,
Tuan Anh Pham,
Nathan Keilbart,
Stephen Weitzner,
James Chapman,
Penghao Xiao,
S. Roger Qiu,
Xiao Chen,
Brandon C. Wood
Abstract:
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include d…
▽ More
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include dihedral angles (ALIGNN-d). This simple extension leads to a memory-efficient graph representation that captures the complete geometry of atomic structures. ALIGNN-d is applied to predict the infrared optical response of dynamically disordered Cu(II) aqua complexes, leveraging the intrinsic interpretability to elucidate the relative contributions of individual structural components. Bond and dihedral angles are found to be critical contributors to the fine structure of the absorption response, with distortions representing transitions between more common geometries exhibiting the strongest absorption intensity. Future directions for further development of ALIGNN-d are discussed.
△ Less
Submitted 15 February, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
A Type and Scope Safe Universe of Syntaxes with Binding: Their Semantics and Proofs
Authors:
Guillaume Allais,
Robert Atkey,
James Chapman,
Conor McBride,
James McKinna
Abstract:
Almost every programming language's syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of $α$-equivalence, capture-avoiding substitution, ty** contexts, runtime environments, and so on. In the past, implementing and reasoning about programming languages required careful handling to maintain the correct behaviour of bound variables. Modern…
▽ More
Almost every programming language's syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of $α$-equivalence, capture-avoiding substitution, ty** contexts, runtime environments, and so on. In the past, implementing and reasoning about programming languages required careful handling to maintain the correct behaviour of bound variables. Modern programming languages include features that enable constraints like scope safety to be expressed in types. Nevertheless, the programmer is still forced to write the same boilerplate over again for each new implementation of a scope safe operation (e.g., renaming, substitution, desugaring, printing, etc.), and then again for correctness proofs.
We present an expressive universe of syntaxes with binding and demonstrate how to (1) implement scope safe traversals once and for all by generic programming; and (2) how to derive properties of these traversals by generic proving. Our universe description, generic traversals and proofs, and our examples have all been formalised in Agda and are available in the accompanying material available online at https://github.com/gallais/generic-syntax.
△ Less
Submitted 12 October, 2021; v1 submitted 29 January, 2020;
originally announced January 2020.
-
More or Less? Predict the Social Influence of Malicious URLs on Social Media
Authors:
Chun-Ming Lai,
Xiaoyun Wang,
Jon W. Chapman,
Yu-Cheng Lin,
Yu-Chang Ho,
S. Felix Wu,
Patrick McDaniel,
Hasan Cam
Abstract:
Users of Online Social Networks (OSNs) interact with each other more than ever. In the context of a public discussion group, people receive, read, and write comments in response to articles and postings. In the absence of access control mechanisms, OSNs are a great environment for attackers to influence others, from spreading phishing URLs, to posting fake news. Moreover, OSN user behavior can be…
▽ More
Users of Online Social Networks (OSNs) interact with each other more than ever. In the context of a public discussion group, people receive, read, and write comments in response to articles and postings. In the absence of access control mechanisms, OSNs are a great environment for attackers to influence others, from spreading phishing URLs, to posting fake news. Moreover, OSN user behavior can be predicted by social science concepts which include conformity and the bandwagon effect. In this paper, we show how social recommendation systems affect the occurrence of malicious URLs on Facebook. We exploit temporal features to build a prediction framework, having greater than 75% accuracy, to predict whether the following group users' behavior will increase or not. Included in this work, we demarcate classes of URLs, including those malicious URLs classified as creating critical damage, as well as those of a lesser nature which only inflict light damage such as aggressive commercial advertisements and spam content. It is our hope that the data and analyses in this paper provide a better understanding of OSN user reactions to different categories of malicious URLs, thereby providing a way to mitigate the influence of these malicious URL attacks.
△ Less
Submitted 7 December, 2018;
originally announced December 2018.
-
Multi-View Community Detection in Facebook Public Pages
Authors:
Zhige Xin,
Chun-Ming Lai,
Jon W. Chapman,
George Barnett,
S. Felix Wu
Abstract:
Community detection in social networks is widely studied because of its importance in uncovering how people connect and interact. However, little attention has been given to community structure in Facebook public pages. In this study, we investigate the community detection problem in Facebook newsgroup pages. In particular, to deal with the diversity of user activities, we apply multi-view cluster…
▽ More
Community detection in social networks is widely studied because of its importance in uncovering how people connect and interact. However, little attention has been given to community structure in Facebook public pages. In this study, we investigate the community detection problem in Facebook newsgroup pages. In particular, to deal with the diversity of user activities, we apply multi-view clustering to integrate different views, for example, likes on posts and likes on comments. In this study, we explore the community structure in not only a given single page but across multiple pages. The results show that our method can effectively reduce isolates and improve the quality of community structure.
△ Less
Submitted 6 December, 2018; v1 submitted 23 September, 2018;
originally announced September 2018.
-
Meraculous2: fast accurate short-read assembly of large polymorphic genomes
Authors:
Jarrod A. Chapman,
Isaac Y. Ho,
Eugene Goltsman,
Daniel S. Rokhsar
Abstract:
We present Meraculous2, an update to the Meraculous short-read assembler that includes (1) handling of allelic variation using "bubble" structures within the de Bruijn graph, (2) improved gap closing, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising scaffolding accuracy. The speed and bandwidth efficiency of the new parallel implementation have…
▽ More
We present Meraculous2, an update to the Meraculous short-read assembler that includes (1) handling of allelic variation using "bubble" structures within the de Bruijn graph, (2) improved gap closing, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising scaffolding accuracy. The speed and bandwidth efficiency of the new parallel implementation have also been substantially improved, allowing the assembly of a human genome to be accomplished in 24 hours on the JGI/NERSC Genepool system. To highlight the features of Meraculous2 we present here the assembly of the diploid human genome NA12878, and compare it with previously published assemblies of the same data using other algorithms. The Meraculous2 assemblies are shown to have better completeness, contiguity, and accuracy than other published assemblies for these data. Practical considerations including pre-assembly analyses of polymorphism and repetitiveness are described.
△ Less
Submitted 7 November, 2017; v1 submitted 2 August, 2016;
originally announced August 2016.
-
Monads need not be endofunctors
Authors:
Thosten Altenkirch,
James Chapman,
Tarmo Uustalu
Abstract:
We introduce a generalization of monads, called relative monads, allowing for underlying functors between different categories. Examples include finite-dimensional vector spaces, untyped and typed lambda-calculus syntax and indexed containers. We show that the Kleisli and Eilenberg-Moore constructions carry over to relative monads and are related to relative adjunctions. Under reasonable assumpti…
▽ More
We introduce a generalization of monads, called relative monads, allowing for underlying functors between different categories. Examples include finite-dimensional vector spaces, untyped and typed lambda-calculus syntax and indexed containers. We show that the Kleisli and Eilenberg-Moore constructions carry over to relative monads and are related to relative adjunctions. Under reasonable assumptions, relative monads are monoids in the functor category concerned and extend to monads, giving rise to a coreflection between relative monads and monads. Arrows are also an instance of relative monads.
△ Less
Submitted 4 March, 2015; v1 submitted 22 December, 2014;
originally announced December 2014.
-
When is a container a comonad?
Authors:
Danel Ahman,
James Chapman,
Tarmo Uustalu
Abstract:
Abbott, Altenkirch, Ghani and others have taught us that many parameterized datatypes (set functors) can be usefully analyzed via container representations in terms of a set of shapes and a set of positions in each shape. This paper builds on the observation that datatypes often carry additional structure that containers alone do not account for. We introduce directed containers to capture the co…
▽ More
Abbott, Altenkirch, Ghani and others have taught us that many parameterized datatypes (set functors) can be usefully analyzed via container representations in terms of a set of shapes and a set of positions in each shape. This paper builds on the observation that datatypes often carry additional structure that containers alone do not account for. We introduce directed containers to capture the common situation where every position in a data-structure determines another data-structure, informally, the sub-data-structure rooted by that position. Some natural examples are non-empty lists and node-labelled trees, and data-structures with a designated position (zippers). While containers denote set functors via a fully-faithful functor, directed containers interpret fully-faithfully into comonads. But more is true: every comonad whose underlying functor is a container is represented by a directed container. In fact, directed containers are the same as containers that are comonads. We also describe some constructions of directed containers. We have formalized our development in the dependently typed programming language Agda.
△ Less
Submitted 2 September, 2014; v1 submitted 25 August, 2014;
originally announced August 2014.
-
Normalization by Evaluation in the Delay Monad: A Case Study for Coinduction via Copatterns and Sized Types
Authors:
Andreas Abel,
James Chapman
Abstract:
In this paper, we present an Agda formalization of a normalizer for simply-typed lambda terms. The normalizer consists of two coinductively defined functions in the delay monad: One is a standard evaluator of lambda terms to closures, the other a type-directed reifier from values to eta-long beta-normal forms. Their composition, normalization-by-evaluation, is shown to be a total function a post…
▽ More
In this paper, we present an Agda formalization of a normalizer for simply-typed lambda terms. The normalizer consists of two coinductively defined functions in the delay monad: One is a standard evaluator of lambda terms to closures, the other a type-directed reifier from values to eta-long beta-normal forms. Their composition, normalization-by-evaluation, is shown to be a total function a posteriori, using a standard logical-relations argument.
The successful formalization serves as a proof-of-concept for coinductive programming and reasoning using sized types and copatterns, a new and presently experimental feature of Agda.
△ Less
Submitted 8 June, 2014;
originally announced June 2014.
-
Proceedings Fourth Workshop on Mathematically Structured Functional Programming
Authors:
James Chapman,
Paul Blain Levy
Abstract:
This volume contains the proceedings of the Fourth Workshop on Mathematically Structured Functional Programming (MSFP 2012), taking place on 25 March, 2012 in Tallinn, Estonia, as a satellite event of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012.
MSFP is devoted to the derivation of functionality from structure. It highlights concepts from algebra, semantic…
▽ More
This volume contains the proceedings of the Fourth Workshop on Mathematically Structured Functional Programming (MSFP 2012), taking place on 25 March, 2012 in Tallinn, Estonia, as a satellite event of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012.
MSFP is devoted to the derivation of functionality from structure. It highlights concepts from algebra, semantics and type theory as they are increasingly reflected in programming practice, especially functional programming. The workshop consists of two invited presentations and eight contributed papers on a range of topics at that interface.
△ Less
Submitted 10 February, 2012;
originally announced February 2012.