Search | arXiv e-print repository

Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Authors: Hyuna Kwon, Tim Hsu, Wenyu Sun, Wonseok Jeong, Fikret Aydin, James Chapman, Xiao Chen, Matthew R. Carbone, Deyu Lu, Fei Zhou, Tuan Anh Pham

Abstract: The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of… ▽ More The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of amorphous carbons ($a$-C) as a representative material system from the target X-ray absorption near edge structure (XANES) spectra--a common experimental technique to probe atomic structures of materials. We show that conditional generation guided by XANES spectra reproduces key features of the target structures. Furthermore, we show that our model can steer the generative process to tailor atomic arrangements for a specific XANES spectrum. Finally, our generative model exhibits a remarkable scale-agnostic property, thereby enabling generation of realistic, large-scale structures through learning from a small-scale dataset (i.e., with small unit cells). Our work represents a significant stride in bridging the gap between materials characterization and atomic structure determination; in addition, it can be leveraged for materials discovery in exploring various material properties as targeted. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2311.10789 [pdf, other]

Stratified-NMF for Heterogeneous Data

Authors: James Chapman, Yotam Yaniv, Deanna Needell

Abstract: Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent st… ▽ More Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 5 pages. Will appear in IEEE Asilomar Conference on Signals, Systems, and Computers 2023

ACM Class: G.1.6; I.5.3; I.5.4

arXiv:2310.01012 [pdf, other]

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning

Authors: James Chapman, Lennie Wells, Ana Lawry Aguila

Abstract: The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promi… ▽ More The Canonical Correlation Analysis (CCA) family of methods is foundational in multiview learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and be unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. Our algorithms show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. These improvements allow us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 features. Finally, we apply our algorithms to match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, and also present theory to clarify the links between these methods and classical CCA, laying the groundwork for future insights. △ Less

Submitted 1 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2308.10654 [pdf, other]

doi 10.4204/EPTCS.383.3

Algebraic Reasoning About Timeliness

Authors: Seyed Hossein Haeri, Peter W. Thompson, Peter Van Roy, Magne Haveraaen, Neil J. Davies, Mikhail Barash, Kevin Hammond, James Chapman

Abstract: Designing distributed systems to have predictable performance under high load is difficult because of resource exhaustion, non-linearity, and stochastic behaviour. Timeliness, i.e., delivering results within defined time bounds, is a central aspect of predictable performance. In this paper, we focus on timeliness using the DELTA-Q Systems Development paradigm (DELTA-QSD, developed by PNSol), which… ▽ More Designing distributed systems to have predictable performance under high load is difficult because of resource exhaustion, non-linearity, and stochastic behaviour. Timeliness, i.e., delivering results within defined time bounds, is a central aspect of predictable performance. In this paper, we focus on timeliness using the DELTA-Q Systems Development paradigm (DELTA-QSD, developed by PNSol), which computes timeliness by modelling systems observationally using so-called outcome expressions. An outcome expression is a compositional definition of a system's observed behaviour in terms of its basic operations. Given the behaviour of the basic operations, DELTA-QSD efficiently computes the stochastic behaviour of the whole system including its timeliness. This paper formally proves useful algebraic properties of outcome expressions w.r.t. timeliness. We prove the different algebraic structures the set of outcome expressions form with the different DELTA-QSD operators and demonstrate why those operators do not form richer structures. We prove or disprove the set of all possible distributivity results on outcome expressions. On our way for disproving 8 of those distributivity results, we develop a technique called properisation, which gives rise to the first body of maths for improper random variables. Finally, we also prove 14 equivalences that have been used in the past in the practice of DELTA-QSD. An immediate benefit is rewrite rules that can be used for design exploration under established timeliness equivalence. This work is part of an ongoing project to disseminate and build tool support for DELTA-QSD. The ability to rewrite outcome expressions is essential for efficient tool support. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: In Proceedings ICE 2023, arXiv:2308.08920

ACM Class: B.8.2; C.4; D.2.4; D.2.8; F.3.2; F.3.1; F.4.1; F.4.3; I.1.1

Journal ref: EPTCS 383, 2023, pp. 35-54

arXiv:2307.10495 [pdf, other]

doi 10.1117/12.2662393

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

Authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi

Abstract: Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning s… ▽ More Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 16 pages, 7 figures, Preprint

ACM Class: I.2.6; I.2.10; I.4.0; I.4.9

Journal ref: Proc. SPIE. Algorithms for Synthetic Aperture Radar Imagery XXX (Vol. 12520, pp. 96-111). 13 June 2023

arXiv:2305.00627 [pdf]

CNN-based fully automatic mitral valve extraction using CT images and existence probability maps

Authors: Yukiteru Masuda, Ryo Ishikawa, Toru Tanaka, Gakuto Aoyama, Keitaro Kawashima, James V. Chapman, Masahiko Asami, Michael Huy Cuong Pham, Klaus Fuglsang Kofoed, Takuya Sakaguchi, Kiyohide Satoh

Abstract: Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitra… ▽ More Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitral valve shape from computed tomography (CT) images for the all phases of the cardiac cycle. This method extracts the mitral valve shape based on DenseNet using both the original CT image and the existence probability maps of the mitral valve area inferred by U-Net as input. A total of 1585 CT images from 204 patients with various cardiac diseases including mitral regurgitation (MR) were collected and manually annotated for mitral valve region. The proposed method was trained and evaluated by 10-fold cross validation using the collected data and was compared with the method without the existence probability maps. The mean error of shape extraction error in the proposed method is 0.88 mm, which is an improvement of 0.32 mm compared with the method without the existence probability maps. △ Less

Submitted 18 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: 15 pages, 6 figure, 3 table. changed title, modified taipo

arXiv:2303.12706 [pdf, other]

Multi-modal Variational Autoencoders for normative modelling across multiple imaging modalities

Authors: Ana Lawry Aguila, James Chapman, Andre Altmann

Abstract: One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to dise… ▽ More One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the 'normal' behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to disease pathology. For many heterogeneous diseases, we expect to observe abnormalities across a range of neuroimaging and biological variables. However, thus far, normative models have largely been developed for studying a single imaging modality. We aim to develop a multi-modal normative modelling framework where abnormality is aggregated across variables of multiple modalities and is better able to detect deviations than uni-modal baselines. We propose two multi-modal VAE normative models to detect subject level deviations across T1 and DTI data. Our proposed models were better able to detect diseased individuals, capture disease severity, and correlate with patient cognition than baseline approaches. We also propose a multivariate latent deviation metric, measuring deviations from the joint latent space, which outperformed feature-based metrics. △ Less

Submitted 2 October, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2212.02421 [pdf, other]

Score-based denoising for atomic structure identification

Authors: Tim Hsu, Babak Sadigh, Nicolas Bertin, Cheol Woo Park, James Chapman, Vasily Bulatov, Fei Zhou

Abstract: We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly revea… ▽ More We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly reveal underlying crystal order while retaining disorder associated with crystal defects. Purely geometric, agnostic to interatomic potentials, and trained without inputs from explicit simulations, our denoiser can be applied to simulation data generated from vastly different interatomic interactions. The denoiser is shown to improve existing classification methods such as common neighbor analysis and polyhedral template matching, reaching perfect classification accuracy on a recent benchmark dataset of thermally perturbed structures up to the melting point. Demonstrated here in a wide variety of atomistic simulation contexts, the denoiser is general, robust, and readily extendable to delineate order from disorder in structurally and chemically complex materials. △ Less

Submitted 3 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.11323 [pdf, other]

A Generalized EigenGame with Extensions to Multiview Representation Learning

Authors: James Chapman, Ana Lawry Aguila, Lennie Wells

Abstract: Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning e… ▽ More Generalized Eigenvalue Problems (GEPs) encompass a range of interesting dimensionality reduction methods. Development of efficient stochastic approaches to these problems would allow them to scale to larger datasets. Canonical Correlation Analysis (CCA) is one example of a GEP for dimensionality reduction which has found extensive use in problems with two or more views of the data. Deep learning extensions of CCA require large mini-batch sizes, and therefore large memory consumption, in the stochastic setting to achieve good performance and this has limited its application in practice. Inspired by the Generalized Hebbian Algorithm, we develop an approach to solving stochastic GEPs in which all constraints are softly enforced by Lagrange multipliers. Then by considering the integral of this Lagrangian function, its pseudo-utility, and inspired by recent formulations of Principal Components Analysis and GEPs as games with differentiable utilities, we develop a game-theory inspired approach to solving GEPs. We show that our approaches share much of the theoretical grounding of the previous Hebbian and game theoretic approaches for the linear case but our method permits extension to general function approximators like neural networks for certain GEPs for dimensionality reduction including CCA which means our method can be used for deep multiview representation learning. We demonstrate the effectiveness of our method for solving GEPs in the stochastic setting using canonical multiview datasets and demonstrate state-of-the-art performance for optimizing Deep CCA. △ Less

Submitted 9 January, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.09025 [pdf]

doi 10.1109/GLOCOM.2017.8254974

Latency Reduction for Mobile Backhaul by Pipelining LTE and DOCSIS

Authors: Jennifer Andreoli-Fang, John T Chapman

Abstract: The small cell market has been growing. To backhaul wireless traffic from small cells, the mobile network operators (MNOs) are looking into economically viable solutions, specifically the hybrid fiber coaxial networks (HFC), in addition to the traditional choice of fiber. When the latencies from both the wireless and the HFC networks are added together, it can result in noticeable end-to-end syste… ▽ More The small cell market has been growing. To backhaul wireless traffic from small cells, the mobile network operators (MNOs) are looking into economically viable solutions, specifically the hybrid fiber coaxial networks (HFC), in addition to the traditional choice of fiber. When the latencies from both the wireless and the HFC networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. If the two networks could somehow coordinate with each other, it would be possible to decrease the total system latency and increase system performance. In this paper, we propose a method to improve upstream user-to-mobile core latency by coordinating the LTE and HFC scheduling. The method reduces the impact on system latency from the HFC network's request-grant-data loop, which is the main contributor of backhaul upstream latency. Through simulation, we show that coordinated scheduling improves overall system latency. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: IEEE Global Communications Conference (GLOBECOM), 2017. arXiv admin note: substantial text overlap with arXiv:2211.08292; text overlap with arXiv:2211.08298

arXiv:2211.08298 [pdf]

doi 10.1109/WCNC.2018.8376975

Low Latency Techniques for Mobile Backhaul over DOCSIS

Authors: John T Chapman, Jennifer Andreoli-Fang, Michel Chauvin, Elias Chavarria Reyes, Zheng Lu, Dantong Liu, Joey Padden, Alon Bernstein

Abstract: The mobile network operators (MNOs) are looking into economically viable backhaul solutions as alternatives to fiber, specifically the hybrid fiber coaxial networks (HFC). When the latencies from both the wireless and the HFC networks are added together, the result is a noticeable end-to-end system latency, particularly under network congestion. In order to decrease total system latency, we propos… ▽ More The mobile network operators (MNOs) are looking into economically viable backhaul solutions as alternatives to fiber, specifically the hybrid fiber coaxial networks (HFC). When the latencies from both the wireless and the HFC networks are added together, the result is a noticeable end-to-end system latency, particularly under network congestion. In order to decrease total system latency, we proposed a method to improve upstream user- to-mobile core latency by coordinating the LTE and HFC scheduling in previous papers. In this paper, we implement and optimize the proposed method on a custom LTE and DOCSIS end-to-end system testbed. The testbed uses the OpenAirInterface (OAI) platform for the LTE network, along with Cisco's broadband router cBR-8 that is currently deployed in the HFC networks around the world. Our results show a backhaul latency improvement under all traffic load conditions. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: IEEE Wireless Communications and Networking Conference (WCNC), 2018. arXiv admin note: text overlap with arXiv:2211.08292

arXiv:2211.08292 [pdf]

doi 10.1109/PIMRC.2017.8292173

Mobile-Aware Scheduling for Low Latency Backhaul over DOCSIS

Authors: Jennifer Andreoli-Fang, John T Chapman

Abstract: In this paper, we discuss latency reduction techniques for mobile backhaul over Data Over Cable Service Interface Specifications (DOCSIS) networks. When the latencies from both the wireless and the DOCSIS networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. Previously, we proposed a method to improve upstream user-to-mobile core… ▽ More In this paper, we discuss latency reduction techniques for mobile backhaul over Data Over Cable Service Interface Specifications (DOCSIS) networks. When the latencies from both the wireless and the DOCSIS networks are added together, it can result in noticeable end-to-end system latency, particularly under network congestion. Previously, we proposed a method to improve upstream user-to-mobile core latency by coordinating the LTE and DOCSIS scheduling. The method reduces the impact on system latency from the DOCSIS network's request-grant-data loop, which is the main contributor of backhaul upstream latency. Since the method reduces latency on the DOCSIS data path, it will therefore improve performance of latency sensitive applications, particularly if TCP is used as the transport protocol, especially when the link is congested. In this paper, we investigate the effect of HARQ failure on system performance. Through simulation, we show that despite the uncertainty introduced by the LTE protocol, coordinated scheduling improves overall system latency. △ Less

Submitted 16 November, 2022; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), 2017

arXiv:2209.00948 [pdf, other]

doi 10.3390/forecast5040036

Macroeconomic Predictions using Payments Data and Machine Learning

Authors: James T. E. Chapman, Ajit Desai

Abstract: Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine lear… ▽ More Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Report number: 2023, 5(4)

Journal ref: Forecasting, 2023

arXiv:2109.11576 [pdf, other]

Efficient, Interpretable Graph Neural Network Representation for Angle-dependent Properties and its Application to Optical Spectroscopy

Authors: Tim Hsu, Tuan Anh Pham, Nathan Keilbart, Stephen Weitzner, James Chapman, Penghao Xiao, S. Roger Qiu, Xiao Chen, Brandon C. Wood

Abstract: Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include d… ▽ More Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include dihedral angles (ALIGNN-d). This simple extension leads to a memory-efficient graph representation that captures the complete geometry of atomic structures. ALIGNN-d is applied to predict the infrared optical response of dynamically disordered Cu(II) aqua complexes, leveraging the intrinsic interpretability to elucidate the relative contributions of individual structural components. Bond and dihedral angles are found to be critical contributors to the fine structure of the absorption response, with distortions representing transitions between more common geometries exhibiting the strongest absorption intensity. Future directions for further development of ALIGNN-d are discussed. △ Less

Submitted 15 February, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2001.11001 [pdf, ps, other]

A Type and Scope Safe Universe of Syntaxes with Binding: Their Semantics and Proofs

Authors: Guillaume Allais, Robert Atkey, James Chapman, Conor McBride, James McKinna

Abstract: Almost every programming language's syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of $α$-equivalence, capture-avoiding substitution, ty** contexts, runtime environments, and so on. In the past, implementing and reasoning about programming languages required careful handling to maintain the correct behaviour of bound variables. Modern… ▽ More Almost every programming language's syntax includes a notion of binder and corresponding bound occurrences, along with the accompanying notions of $α$-equivalence, capture-avoiding substitution, ty** contexts, runtime environments, and so on. In the past, implementing and reasoning about programming languages required careful handling to maintain the correct behaviour of bound variables. Modern programming languages include features that enable constraints like scope safety to be expressed in types. Nevertheless, the programmer is still forced to write the same boilerplate over again for each new implementation of a scope safe operation (e.g., renaming, substitution, desugaring, printing, etc.), and then again for correctness proofs. We present an expressive universe of syntaxes with binding and demonstrate how to (1) implement scope safe traversals once and for all by generic programming; and (2) how to derive properties of these traversals by generic proving. Our universe description, generic traversals and proofs, and our examples have all been formalised in Agda and are available in the accompanying material available online at https://github.com/gallais/generic-syntax. △ Less

Submitted 12 October, 2021; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: Extended version of the ICFP 18 paper

ACM Class: F.3.2

arXiv:1812.02978 [pdf, other]

More or Less? Predict the Social Influence of Malicious URLs on Social Media

Authors: Chun-Ming Lai, Xiaoyun Wang, Jon W. Chapman, Yu-Cheng Lin, Yu-Chang Ho, S. Felix Wu, Patrick McDaniel, Hasan Cam

Abstract: Users of Online Social Networks (OSNs) interact with each other more than ever. In the context of a public discussion group, people receive, read, and write comments in response to articles and postings. In the absence of access control mechanisms, OSNs are a great environment for attackers to influence others, from spreading phishing URLs, to posting fake news. Moreover, OSN user behavior can be… ▽ More Users of Online Social Networks (OSNs) interact with each other more than ever. In the context of a public discussion group, people receive, read, and write comments in response to articles and postings. In the absence of access control mechanisms, OSNs are a great environment for attackers to influence others, from spreading phishing URLs, to posting fake news. Moreover, OSN user behavior can be predicted by social science concepts which include conformity and the bandwagon effect. In this paper, we show how social recommendation systems affect the occurrence of malicious URLs on Facebook. We exploit temporal features to build a prediction framework, having greater than 75% accuracy, to predict whether the following group users' behavior will increase or not. Included in this work, we demarcate classes of URLs, including those malicious URLs classified as creating critical damage, as well as those of a lesser nature which only inflict light damage such as aggressive commercial advertisements and spam content. It is our hope that the data and analyses in this paper provide a better understanding of OSN user reactions to different categories of malicious URLs, thereby providing a way to mitigate the influence of these malicious URL attacks. △ Less

Submitted 7 December, 2018; originally announced December 2018.

Comments: 10 pages, 6 figures

arXiv:1809.08658 [pdf, other]

Multi-View Community Detection in Facebook Public Pages

Authors: Zhige Xin, Chun-Ming Lai, Jon W. Chapman, George Barnett, S. Felix Wu

Abstract: Community detection in social networks is widely studied because of its importance in uncovering how people connect and interact. However, little attention has been given to community structure in Facebook public pages. In this study, we investigate the community detection problem in Facebook newsgroup pages. In particular, to deal with the diversity of user activities, we apply multi-view cluster… ▽ More Community detection in social networks is widely studied because of its importance in uncovering how people connect and interact. However, little attention has been given to community structure in Facebook public pages. In this study, we investigate the community detection problem in Facebook newsgroup pages. In particular, to deal with the diversity of user activities, we apply multi-view clustering to integrate different views, for example, likes on posts and likes on comments. In this study, we explore the community structure in not only a given single page but across multiple pages. The results show that our method can effectively reduce isolates and improve the quality of community structure. △ Less

Submitted 6 December, 2018; v1 submitted 23 September, 2018; originally announced September 2018.

arXiv:1608.01031 [pdf]

Meraculous2: fast accurate short-read assembly of large polymorphic genomes

Authors: Jarrod A. Chapman, Isaac Y. Ho, Eugene Goltsman, Daniel S. Rokhsar

Abstract: We present Meraculous2, an update to the Meraculous short-read assembler that includes (1) handling of allelic variation using "bubble" structures within the de Bruijn graph, (2) improved gap closing, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising scaffolding accuracy. The speed and bandwidth efficiency of the new parallel implementation have… ▽ More We present Meraculous2, an update to the Meraculous short-read assembler that includes (1) handling of allelic variation using "bubble" structures within the de Bruijn graph, (2) improved gap closing, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising scaffolding accuracy. The speed and bandwidth efficiency of the new parallel implementation have also been substantially improved, allowing the assembly of a human genome to be accomplished in 24 hours on the JGI/NERSC Genepool system. To highlight the features of Meraculous2 we present here the assembly of the diploid human genome NA12878, and compare it with previously published assemblies of the same data using other algorithms. The Meraculous2 assemblies are shown to have better completeness, contiguity, and accuracy than other published assemblies for these data. Practical considerations including pre-assembly analyses of polymorphism and repetitiveness are described. △ Less

Submitted 7 November, 2017; v1 submitted 2 August, 2016; originally announced August 2016.

Comments: Supplementary notes included with the manuscript

arXiv:1412.7148 [pdf, ps, other]

doi 10.2168/LMCS-11(1:3)2015

Monads need not be endofunctors

Authors: Thosten Altenkirch, James Chapman, Tarmo Uustalu

Abstract: We introduce a generalization of monads, called relative monads, allowing for underlying functors between different categories. Examples include finite-dimensional vector spaces, untyped and typed lambda-calculus syntax and indexed containers. We show that the Kleisli and Eilenberg-Moore constructions carry over to relative monads and are related to relative adjunctions. Under reasonable assumpti… ▽ More We introduce a generalization of monads, called relative monads, allowing for underlying functors between different categories. Examples include finite-dimensional vector spaces, untyped and typed lambda-calculus syntax and indexed containers. We show that the Kleisli and Eilenberg-Moore constructions carry over to relative monads and are related to relative adjunctions. Under reasonable assumptions, relative monads are monoids in the functor category concerned and extend to monads, giving rise to a coreflection between relative monads and monads. Arrows are also an instance of relative monads. △ Less

Submitted 4 March, 2015; v1 submitted 22 December, 2014; originally announced December 2014.

Journal ref: Logical Methods in Computer Science, Volume 11, Issue 1 (March 6, 2015) lmcs:928

arXiv:1408.5809 [pdf, ps, other]

doi 10.2168/LMCS-10(3:14)2014

When is a container a comonad?

Authors: Danel Ahman, James Chapman, Tarmo Uustalu

Abstract: Abbott, Altenkirch, Ghani and others have taught us that many parameterized datatypes (set functors) can be usefully analyzed via container representations in terms of a set of shapes and a set of positions in each shape. This paper builds on the observation that datatypes often carry additional structure that containers alone do not account for. We introduce directed containers to capture the co… ▽ More Abbott, Altenkirch, Ghani and others have taught us that many parameterized datatypes (set functors) can be usefully analyzed via container representations in terms of a set of shapes and a set of positions in each shape. This paper builds on the observation that datatypes often carry additional structure that containers alone do not account for. We introduce directed containers to capture the common situation where every position in a data-structure determines another data-structure, informally, the sub-data-structure rooted by that position. Some natural examples are non-empty lists and node-labelled trees, and data-structures with a designated position (zippers). While containers denote set functors via a fully-faithful functor, directed containers interpret fully-faithfully into comonads. But more is true: every comonad whose underlying functor is a container is represented by a directed container. In fact, directed containers are the same as containers that are comonads. We also describe some constructions of directed containers. We have formalized our development in the dependently typed programming language Agda. △ Less

Submitted 2 September, 2014; v1 submitted 25 August, 2014; originally announced August 2014.

Journal ref: Logical Methods in Computer Science, Volume 10, Issue 3 (September 3, 2014) lmcs:894

arXiv:1406.2059 [pdf, ps, other]

doi 10.4204/EPTCS.153.4

Normalization by Evaluation in the Delay Monad: A Case Study for Coinduction via Copatterns and Sized Types

Authors: Andreas Abel, James Chapman

Abstract: In this paper, we present an Agda formalization of a normalizer for simply-typed lambda terms. The normalizer consists of two coinductively defined functions in the delay monad: One is a standard evaluator of lambda terms to closures, the other a type-directed reifier from values to eta-long beta-normal forms. Their composition, normalization-by-evaluation, is shown to be a total function a post… ▽ More In this paper, we present an Agda formalization of a normalizer for simply-typed lambda terms. The normalizer consists of two coinductively defined functions in the delay monad: One is a standard evaluator of lambda terms to closures, the other a type-directed reifier from values to eta-long beta-normal forms. Their composition, normalization-by-evaluation, is shown to be a total function a posteriori, using a standard logical-relations argument. The successful formalization serves as a proof-of-concept for coinductive programming and reasoning using sized types and copatterns, a new and presently experimental feature of Agda. △ Less

Submitted 8 June, 2014; originally announced June 2014.

Comments: In Proceedings MSFP 2014, arXiv:1406.1534

ACM Class: D.3.3; F.3.2; F.3.3; F.4.1

Journal ref: EPTCS 153, 2014, pp. 51-67

arXiv:1202.2407

doi 10.4204/EPTCS.76

Proceedings Fourth Workshop on Mathematically Structured Functional Programming

Authors: James Chapman, Paul Blain Levy

Abstract: This volume contains the proceedings of the Fourth Workshop on Mathematically Structured Functional Programming (MSFP 2012), taking place on 25 March, 2012 in Tallinn, Estonia, as a satellite event of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012. MSFP is devoted to the derivation of functionality from structure. It highlights concepts from algebra, semantic… ▽ More This volume contains the proceedings of the Fourth Workshop on Mathematically Structured Functional Programming (MSFP 2012), taking place on 25 March, 2012 in Tallinn, Estonia, as a satellite event of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012. MSFP is devoted to the derivation of functionality from structure. It highlights concepts from algebra, semantics and type theory as they are increasingly reflected in programming practice, especially functional programming. The workshop consists of two invited presentations and eight contributed papers on a range of topics at that interface. △ Less

Submitted 10 February, 2012; originally announced February 2012.

ACM Class: D.3.3; F.3.3

Journal ref: EPTCS 76, 2012

Showing 1–22 of 22 results for author: Chapman, J