Search | arXiv e-print repository

Out of style: Misadventures with LLMs and code style transfer

Authors: Karl Munson, Chih-Kai Ting, Serenity Wade, Anish Savla, Julian Dolby, Kiran Kate, Kavitha Srinivas

Abstract: Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform… ▽ More Like text, programs have styles, and certain programming styles are more desirable than others for program readability, maintainability, and performance. Code style transfer, however, is difficult to automate except for trivial style guidelines such as limits on line length. Inspired by the success of using language models for text style transfer, we investigate if code language models can perform code style transfer. Code style transfer, unlike text transfer, has rigorous requirements: the system needs to identify lines of code to change, change them correctly, and leave the rest of the program untouched. We designed CSB (Code Style Benchmark), a benchmark suite of code style transfer tasks across five categories including converting for-loops to list comprehensions, eliminating duplication in code, adding decorators to methods, etc. We then used these tests to see if large pre-trained code language models or fine-tuned models perform style transfer correctly, based on rigorous metrics to test that the transfer did occur, and the code still passes functional tests. Surprisingly, language models failed to perform all of the tasks, suggesting that they perform poorly on tasks that require code understanding. We will make available the large-scale corpora to help the community build better code models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2309.04607 [pdf]

Linking Symptom Inventories using Semantic Textual Similarity

Authors: Eamonn Kennedy, Shashank Vadlamani, Hannah M Lindsey, Kelly S Peterson, Kristen Dams OConnor, Kenton Murray, Ronak Agarwal, Houshang H Amiri, Raeda K Andersen, Talin Babikian, David A Baron, Erin D Bigler, Karen Caeyenberghs, Lisa Delano-Wood, Seth G Disner, Ekaterina Dobryakova, Blessen C Eapen, Rachel M Edelstein, Carrie Esopenko, Helen M Genova, Elbert Geuze, Naomi J Goodrich-Hunsaker, Jordan Grafman, Asta K Haberg, Cooper B Hodges , et al. (57 additional authors not shown)

Abstract: An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores… ▽ More An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2209.15321 [pdf, other]

Leveraging variational autoencoders for multiple data imputation

Authors: Breeshey Roskams-Hieter, Jude Wells, Sara Wade

Abstract: Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing da… ▽ More Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $β$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $β$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $β$-VAEs can avoid false discoveries that arise as artefacts of imputation. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 17 pages, 3 main figures, 6 supplementary figures

arXiv:2209.06273 [pdf, other]

Exploring Code Style Transfer with Neural Networks

Authors: Karl Munson, Anish Savla, Chih-Kai Ting, Serenity Wade, Kiran Kate, Kavitha Srinivas

Abstract: Style is a significant component of natural language text, reflecting a change in the tone of text while kee** the underlying information the same. Even though programming languages have strict syntax rules, they also have style. Code can be written with the same functionality but using different language features. However, programming style is difficult to quantify, and thus as part of this wor… ▽ More Style is a significant component of natural language text, reflecting a change in the tone of text while kee** the underlying information the same. Even though programming languages have strict syntax rules, they also have style. Code can be written with the same functionality but using different language features. However, programming style is difficult to quantify, and thus as part of this work, we define style attributes, specifically for Python. To build a definition of style, we utilized hierarchical clustering to capture a style definition without needing to specify transformations. In addition to defining style, we explore the capability of a pre-trained code language model to capture information about code style. To do this, we fine-tuned pre-trained code-language models and evaluated their performance in code style transfer tasks. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2208.12830 [pdf, other]

Mixtures of Gaussian Process Experts with SMC$^2$

Authors: Teemu Härkönen, Sara Wade, Kody Law, Lassi Roininen

Abstract: Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexit… ▽ More Gaussian processes are a key component of many flexible statistical and machine learning models. However, they exhibit cubic computational complexity and high memory constraints due to the need of inverting and storing a full covariance matrix. To circumvent this, mixtures of Gaussian process experts have been considered where data points are assigned to independent experts, reducing the complexity by allowing inference based on smaller, local covariance matrices. Moreover, mixtures of Gaussian process experts substantially enrich the model's flexibility, allowing for behaviors such as non-stationarity, heteroscedasticity, and discontinuities. In this work, we construct a novel inference approach based on nested sequential Monte Carlo samplers to simultaneously infer both the gating network and Gaussian process expert parameters. This greatly improves inference compared to importance sampling, particularly in settings when a stationary Gaussian process is inappropriate, while still being thoroughly parallelizable. △ Less

Submitted 26 August, 2022; originally announced August 2022.

arXiv:2109.14171 [pdf, other]

Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data

Authors: W Yu, S Wade, H D Bondell, L Azizi

Abstract: High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly on a t… ▽ More High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly on a two-stage approach performing variable selection and classification separately. We propose in this work a novel Gaussian process discriminant analysis (GPDA) that combines these steps in a unified framework. Our model is a two-layer non-stationary Gaussian process coupled with an Ising prior to identify differentially-distributed locations. Scalable inference is achieved via develo** a variational scheme that exploits advances in the use of sparse inverse covariance matrices. We demonstrate the performance of our methodology on simulated datasets and two proteomics datasets: breast cancer and SARS-CoV-2. Our approach distinguishes itself by offering explainability as well as uncertainty quantification in addition to low computational cost, which are crucial to increase trust and social acceptance of data-driven tools. △ Less

Submitted 28 September, 2021; originally announced September 2021.

arXiv:2106.10716 [pdf]

Machine learning in the social and health sciences

Authors: Anja K. Leist, Matthias Klee, Jung Hyun Kim, David H. Rehkopf, Stéphane P. A. Bordas, Graciela Muniz-Terrera, Sara Wade

Abstract: The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data scie… ▽ More The uptake of machine learning (ML) approaches in the social and health sciences has been rather slow, and research using ML for social and health research questions remains fragmented. This may be due to the separate development of research in the computational/data versus social and health sciences as well as a lack of accessible overviews and adequate training in ML techniques for non data science researchers. This paper provides a meta-map** of research questions in the social and health sciences to appropriate ML approaches, by incorporating the necessary requirements to statistical analysis in these disciplines. We map the established classification into description, prediction, and causal inference to common research goals, such as estimating prevalence of adverse health or social outcomes, predicting the risk of an event, and identifying risk factors or causes of adverse outcomes. This meta-map** aims at overcoming disciplinary barriers and starting a fluid dialogue between researchers from the social and health sciences and methodologically trained researchers. Such map** may also help to fully exploit the benefits of ML while considering domain-specific aspects relevant to the social and health sciences, and hopefully contribute to the acceleration of the uptake of ML applications to advance both basic and applied social and health sciences research. △ Less

Submitted 20 June, 2021; originally announced June 2021.

arXiv:2006.13309 [pdf, other]

doi 10.1007/s10994-023-06491-x

Fast Deep Mixtures of Gaussian Process Experts

Authors: Clement Etienam, Kody Law, Sara Wade, Vitaly Zankin

Abstract: Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the exper… ▽ More Mixtures of experts have become an indispensable tool for flexible modelling in a supervised learning context, allowing not only the mean function but the entire density of the output to change with the inputs. Sparse Gaussian processes (GP) have shown promise as a leading candidate for the experts in such models, and in this article, we propose to design the gating network for selecting the experts from such mixtures of sparse GPs using a deep neural network (DNN). Furthermore, a fast one pass algorithm called Cluster-Classify-Regress (CCR) is leveraged to approximate the maximum a posteriori (MAP) estimator extremely quickly. This powerful combination of model and algorithm together delivers a novel method which is flexible, robust, and extremely efficient. In particular, the method is able to outperform competing methods in terms of accuracy and uncertainty quantification. The cost is competitive on low-dimensional and small data sets, but is significantly lower for higher-dimensional and big data sets. Iteratively maximizing the distribution of experts given allocations and allocations given experts does not provide significant improvement, which indicates that the algorithm achieves a good approximation to the local MAP estimator very fast. This insight can be useful also in the context of other mixture of experts models. △ Less

Submitted 30 November, 2023; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: 22 pages, 28 figures, to be published in Machine Learning journal

Journal ref: Machine Learning (2024)

arXiv:2001.00406 [pdf, other]

doi 10.1017/S1471068420000010

Rethinking Defeasible Reasoning: A Scalable Approach

Authors: Michael J. Maher, Ilias Tachmazidis, Grigoris Antoniou, Stephen Wade, Long Cheng

Abstract: Recent technological advances have led to unprecedented amounts of generated data that originate from the Web, sensor networks and social media. Analytics in terms of defeasible reasoning - for example for decision making - could provide richer knowledge of the underlying domain. Traditionally, defeasible reasoning has focused on complex knowledge structures over small to medium amounts of data, b… ▽ More Recent technological advances have led to unprecedented amounts of generated data that originate from the Web, sensor networks and social media. Analytics in terms of defeasible reasoning - for example for decision making - could provide richer knowledge of the underlying domain. Traditionally, defeasible reasoning has focused on complex knowledge structures over small to medium amounts of data, but recent research efforts have attempted to parallelize the reasoning process over theories with large numbers of facts. Such work has shown that traditional defeasible logics come with overheads that limit scalability. In this work, we design a new logic for defeasible reasoning, thus ensuring scalability by design. We establish several properties of the logic, including its relation to existing defeasible logics. Our experimental results indicate that our approach is indeed scalable and defeasible reasoning can be applied to billions of facts. △ Less

Submitted 2 January, 2020; originally announced January 2020.

Comments: Under consideration in Theory and Practice of Logic Programming (TPLP)

Journal ref: Theory and Practice of Logic Programming 20 (2020) 552-586

arXiv:1905.12969 [pdf, other]

Enriched Mixtures of Gaussian Process Experts

Authors: Charles W. L. Gadd, Sara Wade, Alexis Boukouvalas

Abstract: Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues o… ▽ More Mixtures of experts probabilistically divide the input space into regions, where the assumptions of each expert, or conditional model, need only hold locally. Combined with Gaussian process (GP) experts, this results in a powerful and highly flexible model. We focus on alternative mixtures of GP experts, which model the joint distribution of the inputs and targets explicitly. We highlight issues of this approach in multi-dimensional input spaces, namely, poor scalability and the need for an unnecessarily large number of experts, degrading the predictive performance and increasing uncertainty. We construct a novel model to address these issues through a nested partitioning scheme that automatically infers the number of components at both levels. Multiple response types are accommodated through a generalised GP framework, while multiple input types are included through a factorised exponential family structure. We show the effectiveness of our approach in estimating a parsimonious probabilistic description of both synthetic data of increasing dimension and an Alzheimer's challenge dataset. △ Less

Submitted 30 May, 2019; originally announced May 2019.

arXiv:1803.10746 [pdf, other]

Pseudo-marginal Bayesian inference for supervised Gaussian process latent variable models

Authors: Charles Gadd, Sara Wade, Akeel Shah, Dimitris Grammatopoulos

Abstract: We introduce a Bayesian framework for inference with a supervised version of the Gaussian process latent variable model. The framework overcomes the high correlations between latent variables and hyperparameters by using an unbiased pseudo estimate for the marginal likelihood that approximately integrates over the latent variables. This is used to construct a Markov Chain to explore the posterior… ▽ More We introduce a Bayesian framework for inference with a supervised version of the Gaussian process latent variable model. The framework overcomes the high correlations between latent variables and hyperparameters by using an unbiased pseudo estimate for the marginal likelihood that approximately integrates over the latent variables. This is used to construct a Markov Chain to explore the posterior of the hyperparameters. We demonstrate the procedure on simulated and real examples, showing its ability to capture uncertainty and multimodality of the hyperparameters and improved uncertainty quantification in predictions when compared with variational inference. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Comments: 9 pages, 2 figures, working paper

arXiv:1706.02480 [pdf, other]

Forward Thinking: Building and Training Neural Networks One Layer at a Time

Authors: Chris Hettinger, Tanner Christensen, Ben Ehlert, Jeffrey Humpherys, Tyler Jarvis, Sean Wade

Abstract: We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they a… ▽ More We present a general framework for training deep neural networks without backpropagation. This substantially decreases training time and also allows for construction of deep networks with many sorts of learners, including networks whose layers are defined by functions that are not easily differentiated, like decision trees. The main idea is that layers can be trained one at a time, and once they are trained, the input data are mapped forward through the layer to create a new learning problem. The process is repeated, transforming the data through multiple layers, one at a time, rendering a new data set, which is expected to be better behaved, and on which a final output layer can achieve good performance. We call this forward thinking and demonstrate a proof of concept by achieving state-of-the-art accuracy on the MNIST dataset for convolutional neural networks. We also provide a general mathematical formulation of forward thinking that allows for other types of deep learning problems to be considered. △ Less

Submitted 8 June, 2017; originally announced June 2017.

Showing 1–12 of 12 results for author: Wade, S