Search | arXiv e-print repository

Scaling and evaluating sparse autoencoders

Authors: Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu

Abstract: Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstr… ▽ More Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstruction and sparsity objectives and the presence of dead latents. We propose using k-sparse autoencoders [Makhzani and Frey, 2013] to directly control sparsity, simplifying tuning and improving the reconstruction-sparsity frontier. Additionally, we find modifications that result in few dead latents, even at the largest scales we tried. Using these techniques, we find clean scaling laws with respect to autoencoder size and sparsity. We also introduce several new metrics for evaluating feature quality based on the recovery of hypothesized features, the explainability of activation patterns, and the sparsity of downstream effects. These metrics all generally improve with autoencoder size. To demonstrate the scalability of our approach, we train a 16 million latent autoencoder on GPT-4 activations for 40 billion tokens. We release training code and autoencoders for open-source models, as well as a visualizer. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2207.11622 [pdf]

doi 10.1002/adma.202303283

Unconventional Charge-density-wave Order in a Dilute d-band Semiconductor

Authors: Huandong Chen, Batyr Ilyas, Boyang Zhao, Emre Ergecen, Josh Mutch, Gwan Yeong Jung, Qian Song, Connor A. Occhialini, Guodong Ren, Sara Shabani, Eric Seewald, Shanyuan Niu, Jiangbin Wu, Nan Wang, Mythili Surendran, Shantanu Singh, Jiang Luo, Sanae Ohtomo, Gemma Goh, Bryan C. Chakoumakos, Simon J. Teat, Brent Melot, Han Wang, Di Xiao, Abhay N. Pasupathy , et al. (5 additional authors not shown)

Abstract: Electron-lattice coupling effects in low dimensional materials give rise to charge density wave (CDW) order and phase transitions. These phenomena are critical ingredients for superconductivity and predominantly occur in metallic model systems such as doped cuprates, transition metal dichalcogenides, and more recently, in Kagome lattice materials. However, CDW in semiconducting systems, specifical… ▽ More Electron-lattice coupling effects in low dimensional materials give rise to charge density wave (CDW) order and phase transitions. These phenomena are critical ingredients for superconductivity and predominantly occur in metallic model systems such as doped cuprates, transition metal dichalcogenides, and more recently, in Kagome lattice materials. However, CDW in semiconducting systems, specifically at the limit of low carrier concentration region, is uncommon. Here, we combine electrical transport, synchrotron X-ray diffraction and optical spectroscopy to discover CDW order in a quasi-one-dimensional (1D), dilute d-band semiconductor, BaTiS3, which suggests the existence of strong electron-phonon coupling. The CDW state further undergoes an unusual transition featuring a sharp increase in carrier mobility. Our work establishes BaTiS3 as a unique platform to study the CDW physics in the dilute filling limit to explore novel electronic phases. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Journal ref: Adv. Mater. 2023, 2303283

arXiv:2201.10958 [pdf, other]

Two Results about the Sackin and Colless Indices for Phylogenetic Trees and Their Shapes

Authors: Gary Goh, Michael Fuchs, Louxin Zhang

Abstract: The Sackin and Colless indices are two widely-used metrics for measuring the balance of trees and for testing evolutionary models in phylogenetics. This short paper contributes two results about the Sackin and Colless indices of trees. One result is the asymptotic analysis of the expected Sackin and Colless indices of a tree shape (which are full binary rooted unlabelled trees) under the uniform m… ▽ More The Sackin and Colless indices are two widely-used metrics for measuring the balance of trees and for testing evolutionary models in phylogenetics. This short paper contributes two results about the Sackin and Colless indices of trees. One result is the asymptotic analysis of the expected Sackin and Colless indices of a tree shape (which are full binary rooted unlabelled trees) under the uniform model where tree shapes are sampled with equal probability. Another is a short elementary proof of the closed formula for the expected Sackin index of phylogenetic trees (which are full binary rooted trees with leaves being labelled with taxa) under the uniform model. △ Less

Submitted 18 July, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 10 pages, 1 fugre

MSC Class: 05A16; 05C30; 92D15

arXiv:2103.00020 [pdf, other]

Learning Transferable Visual Models From Natural Language Supervision

Authors: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever

Abstract: State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstr… ▽ More State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP. △ Less

Submitted 26 February, 2021; originally announced March 2021.

arXiv:2102.12092 [pdf, other]

Zero-Shot Text-to-Image Generation

Authors: Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever

Abstract: Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and… ▽ More Text-to-image generation has traditionally focused on finding better modeling assumptions for training on a fixed dataset. These assumptions might involve complex architectures, auxiliary losses, or side information such as object part labels or segmentation masks supplied during training. We describe a simple approach for this task based on a transformer that autoregressively models the text and image tokens as a single stream of data. With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion. △ Less

Submitted 26 February, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

arXiv:2006.07663 [pdf, other]

Bayesian causal inference with some invalid instrumental variables

Authors: Gyuhyeong Goh, Jisang Yu

Abstract: In observational studies, instrumental variables estimation is greatly utilized to identify causal effects. One of the key conditions for the instrumental variables estimator to be consistent is the exclusion restriction, which indicates that instruments affect the outcome of interest only via the exposure variable of interest. We propose a likelihood-free Bayesian approach to make consistent infe… ▽ More In observational studies, instrumental variables estimation is greatly utilized to identify causal effects. One of the key conditions for the instrumental variables estimator to be consistent is the exclusion restriction, which indicates that instruments affect the outcome of interest only via the exposure variable of interest. We propose a likelihood-free Bayesian approach to make consistent inferences about the causal effect when there are some invalid instruments in a way that they violate the exclusion restriction condition. Asymptotic properties of the proposed Bayes estimator, including consistency and normality, are established. A simulation study demonstrates that the proposed Bayesian method produces consistent point estimators and valid credible intervals with correct coverage rates for Gaussian and non-Gaussian data with some invalid instruments. We also demonstrate the proposed method through the real data application. △ Less

Submitted 13 June, 2020; originally announced June 2020.

arXiv:2005.13719 [pdf, other]

Synthetic control method with convex hull restrictions: A Bayesian maximum a posteriori approach

Authors: Gyuhyeong Goh, Jisang Yu

Abstract: Synthetic control methods have gained popularity among causal studies with observational data, particularly when estimating the impacts of the interventions that are implemented to a small number of large units. Implementing the synthetic control methods faces two major challenges: a) estimating weights for each control unit to create a synthetic control and b) providing statistical inferences. To… ▽ More Synthetic control methods have gained popularity among causal studies with observational data, particularly when estimating the impacts of the interventions that are implemented to a small number of large units. Implementing the synthetic control methods faces two major challenges: a) estimating weights for each control unit to create a synthetic control and b) providing statistical inferences. To overcome these challenges, we propose a Bayesian framework that implements the synthetic control method with the parallelly shiftable convex hull and provides a useful Bayesian inference, which is drawn from the duality between a penalized least squares method and a Bayesian Maximum A Posteriori (MAP) approach. Simulation results indicate that the proposed method leads to smaller biases compared to alternatives. We apply our Bayesian method to the real data example of Abadie and Gardeazabal (2003) and find that the treatment effects are statistically significant during the subset of the post-treatment period. △ Less

Submitted 27 May, 2020; originally announced May 2020.

arXiv:2004.10484 [pdf, other]

doi 10.1109/ICPR48806.2021.9413242

Understanding Integrated Gradients with SmoothTaylor for Deep Neural Network Attribution

Authors: Gary S. W. Goh, Sebastian Lapuschkin, Leander Weber, Wojciech Samek, Alexander Binder

Abstract: Integrated Gradients as an attribution method for deep neural network models offers simple implementability. However, it suffers from noisiness of explanations which affects the ease of interpretability. The SmoothGrad technique is proposed to solve the noisiness issue and smoothen the attribution maps of any gradient-based attribution method. In this paper, we present SmoothTaylor as a novel theo… ▽ More Integrated Gradients as an attribution method for deep neural network models offers simple implementability. However, it suffers from noisiness of explanations which affects the ease of interpretability. The SmoothGrad technique is proposed to solve the noisiness issue and smoothen the attribution maps of any gradient-based attribution method. In this paper, we present SmoothTaylor as a novel theoretical concept bridging Integrated Gradients and SmoothGrad, from the Taylor's theorem perspective. We apply the methods to the image classification problem, using the ILSVRC2012 ImageNet object recognition dataset, and a couple of pretrained image models to generate attribution maps. These attribution maps are empirically evaluated using quantitative measures for sensitivity and noise level. We further propose adaptive noising to optimize for the noise scale hyperparameter value. From our experiments, we find that the SmoothTaylor approach together with adaptive noising is able to generate better quality saliency maps with lesser noise and higher sensitivity to the relevant points in the input space as compared to Integrated Gradients. △ Less

Submitted 2 September, 2021; v1 submitted 22 April, 2020; originally announced April 2020.

Comments: 8 pages, 3 figures. Accepted in 25th International Conference on Pattern Recognition, (ICPR) 2020. In Proceedings: pp. 4949-4956

arXiv:2002.01535 [pdf, ps, other]

Lightweight Convolutional Representations for On-Device Natural Language Processing

Authors: Shrey Desai, Geoffrey Goh, Arun Babu, Ahmed Aly

Abstract: The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight… ▽ More The increasing computational and memory complexities of deep neural networks have made it difficult to deploy them on low-resource electronic devices (e.g., mobile phones, tablets, wearables). Practitioners have developed numerous model compression methods to address these concerns, but few have condensed input representations themselves. In this work, we propose a fast, accurate, and lightweight convolutional representation that can be swapped into any neural model and compressed significantly (up to 32x) with a negligible reduction in performance. In addition, we show gains over recurrent representations when considering resource-centric metrics (e.g., model file size, latency, memory usage) on a Samsung Galaxy S9. △ Less

Submitted 4 February, 2020; originally announced February 2020.

Comments: Accepted to MLSys 2020

arXiv:1911.06876 [pdf, other]

Explanatory Masks for Neural Network Interpretability

Authors: Lawrence Phillips, Garrett Goh, Nathan Hodas

Abstract: Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original netwo… ▽ More Neural network interpretability is a vital component for applications across a wide variety of domains. In such cases it is often useful to analyze a network which has already been trained for its specific purpose. In this work, we develop a method to produce explanation masks for pre-trained networks. The mask localizes the most important aspects of each input for prediction of the original network. Masks are created by a secondary network whose goal is to create as small an explanation as possible while still preserving the predictive accuracy of the original network. We demonstrate the applicability of our method for image classification with CNNs, sentiment analysis with RNNs, and chemical property prediction with mixed CNN/RNN architectures. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: Presented at IJCAI-18 Workshop on Explainable Artificial Intelligence (XAI)

arXiv:1910.03741 [pdf, other]

Multiple-objective Reinforcement Learning for Inverse Design and Identification

Authors: Haoran Wei, Mariefel Olarte, Garrett B. Goh

Abstract: The aim of the inverse chemical design is to develop new molecules with given optimized molecular properties or objectives. Recently, generative deep learning (DL) networks are considered as the state-of-the-art in inverse chemical design and have achieved early success in generating molecular structures with desired properties in the pharmaceutical and material chemistry fields. However, satisfyi… ▽ More The aim of the inverse chemical design is to develop new molecules with given optimized molecular properties or objectives. Recently, generative deep learning (DL) networks are considered as the state-of-the-art in inverse chemical design and have achieved early success in generating molecular structures with desired properties in the pharmaceutical and material chemistry fields. However, satisfying a large number (larger than 10 objectives) of molecular objectives is a limitation of current generative models. To improve the model's ability to handle a large number of molecule design objectives, we developed a Reinforcement Learning (RL) based generative framework to optimize chemical molecule generation. Our use of Curriculum Learning (CL) to fine-tune the pre-trained generative network allowed the model to satisfy up to 21 objectives and increase the generative network's robustness. The experiments show that the proposed multiple-objective RL-based generative model can correctly identify unknown molecules with an 83 to 100 percent success rate, compared to the baseline approach of 0 percent. Additionally, this proposed generative model is not limited to just chemistry research challenges; we anticipate that problems that utilize RL with multiple-objectives will benefit from this framework. △ Less

Submitted 8 October, 2019; originally announced October 2019.

arXiv:1811.11950 [pdf, other]

Accounting for model uncertainty in multiple imputation under complex sampling

Authors: Gyuhyeong Goh, Jae Kwang Kim

Abstract: Multiple imputation provides an effective way to handle missing data. When several possible models are under consideration for the data, the multiple imputation is typically performed under a single-best model selected from the candidate models. This single model selection approach ignores the uncertainty associated with the model selection and so leads to underestimation of the variance of multip… ▽ More Multiple imputation provides an effective way to handle missing data. When several possible models are under consideration for the data, the multiple imputation is typically performed under a single-best model selected from the candidate models. This single model selection approach ignores the uncertainty associated with the model selection and so leads to underestimation of the variance of multiple imputation estimator. In this paper, we propose a new multiple imputation procedure incorporating model uncertainty in the final inference. The proposed method incorporates possible candidate models for the data into the imputation procedure using the idea of Bayesian Model Averaging (BMA). The proposed method is directly applicable to handling item nonresponse in survey sampling. Asymptotic properties of the proposed method are investigated. A limited simulation study confirms that our model averaging approach provides better estimation performance than the single model selection approach. △ Less

Submitted 28 November, 2018; originally announced November 2018.

Comments: 23 pages, 1 Table

arXiv:1809.05127 [pdf, other]

IL-Net: Using Expert Knowledge to Guide the Design of Furcated Neural Networks

Authors: Khushmeen Sakloth, Wesley Beckner, Jim Pfaendtner, Garrett B. Goh

Abstract: Deep neural networks (DNN) excel at extracting patterns. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. Designing optimal network architectures from a principled or rational approach however has been less than successful, with the best successful approaches utilizing… ▽ More Deep neural networks (DNN) excel at extracting patterns. Through representation learning and automated feature engineering on large datasets, such models have been highly successful in computer vision and natural language applications. Designing optimal network architectures from a principled or rational approach however has been less than successful, with the best successful approaches utilizing an additional machine learning algorithm to tune the network hyperparameters. However, in many technical fields, there exist established domain knowledge and understanding about the subject matter. In this work, we develop a novel furcated neural network architecture that utilizes domain knowledge as high-level design principles of the network. We demonstrate proof-of-concept by develo** IL-Net, a furcated network for predicting the properties of ionic liquids, which is a class of complex multi-chemicals entities. Compared to existing state-of-the-art approaches, we show that furcated networks can improve model accuracy by approximately 20-35%, without using additional labeled data. Lastly, we distill two key design principles for furcated networks that can be adapted to other domains. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: Submitted to peer-reviewed ML conference

arXiv:1808.04456 [pdf, other]

Multimodal Deep Neural Networks using Both Engineered and Learned Representations for Biodegradability Prediction

Authors: Garrett B. Goh, Khushmeen Sakloth, Charles Siegel, Abhinav Vishnu, Jim Pfaendtner

Abstract: Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature en… ▽ More Deep learning algorithms excel at extracting patterns from raw data, and with large datasets, they have been very successful in computer vision and natural language applications. However, in other domains, large datasets on which to learn representations from may not exist. In this work, we develop a novel multimodal CNN-MLP neural network architecture that utilizes both domain-specific feature engineering as well as learned representations from raw data. We illustrate the effectiveness of such network designs in the chemical sciences, for predicting biodegradability. DeepBioD, a multimodal CNN-MLP network is more accurate than either standalone network designs, and achieves an error classification rate of 0.125 that is 27% lower than the current state-of-the-art. Thus, our work indicates that combining traditional feature engineering with representation learning can be effective, particularly in situations where labeled data is limited. △ Less

Submitted 13 September, 2018; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: Submitted to a peer-reviewed ML conference

arXiv:1807.10873 [pdf, other]

Bayesian Sparse Propensity Score Estimation for Unit Nonresponse

Authors: Hejian Sang, Gyuhyeong Goh, Jae Kwang Kim

Abstract: Nonresponse weighting adjustment using propensity score is a popular method for handling unit nonresponse. However, including all available auxiliary variables into the propensity model can lead to inefficient and inconsistent estimation, especially with high-dimensional covariates. In this paper, a new Bayesian method using the Spike-and-Slab prior is proposed for sparse propensity score estimati… ▽ More Nonresponse weighting adjustment using propensity score is a popular method for handling unit nonresponse. However, including all available auxiliary variables into the propensity model can lead to inefficient and inconsistent estimation, especially with high-dimensional covariates. In this paper, a new Bayesian method using the Spike-and-Slab prior is proposed for sparse propensity score estimation. The proposed method is not based on any model assumption on the outcome variable and is computationally efficient. Instead of doing model selection and parameter estimation separately as in many frequentist methods, the proposed method simultaneously selects the sparse response probability model and provides consistent parameter estimation. Some asymptotic properties of the proposed method are presented. The efficiency of this sparse propensity score estimator is further improved by incorporating related auxiliary variables from the full sample. The finite-sample performance of the proposed method is investigated in two limited simulation studies, including a partially simulated real data example from the Korean Labor and Income Panel Survey. △ Less

Submitted 27 July, 2018; originally announced July 2018.

Comments: 38 pages, 3 tables

arXiv:1712.02734 [pdf, other]

Using Rule-Based Labels for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas

Abstract: With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervi… ▽ More With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop an approach of using rule-based knowledge for training ChemNet, a transferable and generalizable deep neural network for chemical property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. When coupled with transfer learning approaches to predict other smaller datasets for chemical properties that it was not originally trained on, we show that ChemNet's accuracy outperforms contemporary DNN models that were trained using conventional supervised learning. Furthermore, we demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that this approach is network architecture agnostic and is effective across multiple data modalities. Our results indicate a pre-trained ChemNet that incorporates chemistry domain knowledge, enables the development of generalizable neural networks for more accurate prediction of novel chemical properties. △ Less

Submitted 18 March, 2018; v1 submitted 7 December, 2017; originally announced December 2017.

Comments: Submitted to SIGKDD 2018

arXiv:1712.02034 [pdf, other]

SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties

Authors: Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu

Abstract: Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the n… ▽ More Chemical databases store information in text representations, and the SMILES format is a universal standard used in many cheminformatics software. Encoded in each SMILES string is structural information that can be used to predict complex chemical properties. In this work, we develop SMILES2vec, a deep RNN that automatically learns features from SMILES to predict chemical properties, without the need for additional explicit feature engineering. Using Bayesian optimization methods to tune the network architecture, we show that an optimized SMILES2vec model can serve as a general-purpose neural network for predicting distinct chemical properties including toxicity, activity, solubility and solvation energy, while also outperforming contemporary MLP neural networks that uses engineered features. Furthermore, we demonstrate proof-of-concept of interpretability by develo** an explanation mask that localizes on the most important characters used in making a prediction. When tested on the solubility dataset, it identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept and provide state-of-the-art accuracy, making interpretable deep neural networks a useful tool of relevance to the chemical industry. △ Less

Submitted 18 March, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

Comments: Submitted to SIGKDD 2018

arXiv:1710.02238 [pdf, other]

How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker

Abstract: The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of m… ▽ More The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model's performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties. △ Less

Submitted 18 March, 2018; v1 submitted 5 October, 2017; originally announced October 2017.

Comments: In Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision (WACV)

arXiv:1706.06689 [pdf]

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

Authors: Garrett B. Goh, Charles Siegel, Abhinav Vishnu, Nathan O. Hodas, Nathan Baker

Abstract: In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. W… ▽ More In the last few years, we have seen the transformative impact of deep learning in many applications, particularly in speech recognition and computer vision. Inspired by Google's Inception-ResNet deep convolutional neural network (CNN) for image classification, we have developed "Chemception", a deep CNN for the prediction of chemical properties, using just the images of 2D drawings of molecules. We develop Chemception without providing any additional explicit chemistry knowledge, such as basic concepts like periodicity, or advanced features like molecular descriptors and fingerprints. We then show how Chemception can serve as a general-purpose neural network architecture for predicting toxicity, activity, and solvation properties when trained on a modest database of 600 to 40,000 compounds. When compared to multi-layer perceptron (MLP) deep neural networks trained with ECFP fingerprints, Chemception slightly outperforms in activity and solvation prediction and slightly underperforms in toxicity prediction. Having matched the performance of expert-developed QSAR/QSPR deep learning models, our work demonstrates the plausibility of using deep neural networks to assist in computational chemistry research, where the feature engineering process is performed primarily by a deep learning algorithm. △ Less

Submitted 20 June, 2017; originally announced June 2017.

Comments: Submitted to a chemistry peer-reviewed journal

arXiv:1701.04503 [pdf]

Deep Learning for Computational Chemistry

Authors: Garrett B. Goh, Nathan O. Hodas, Abhinav Vishnu

Abstract: The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many do… ▽ More The rise and fall of artificial neural networks is well documented in the scientific literature of both computer science and computational chemistry. Yet almost two decades later, we are now seeing a resurgence of interest in deep learning, a machine learning algorithm based on multilayer neural networks. Within the last few years, we have seen the transformative impact of deep learning in many domains, particularly in speech recognition and computer vision, to the extent that the majority of expert practitioners in those field are now regularly eschewing prior established models in favor of deep learning models. In this review, we provide an introductory overview into the theory of deep neural networks and their unique properties that distinguish them from traditional machine learning algorithms used in cheminformatics. By providing an overview of the variety of emerging applications of deep neural networks, we highlight its ubiquity and broad applicability to a wide range of challenges in the field, including QSAR, virtual screening, protein structure prediction, quantum chemistry, materials design and property prediction. In reviewing the performance of deep neural networks, we observed a consistent outperformance against non-neural networks state-of-the-art models across disparate research topics, and deep neural network based models often exceeded the "glass ceiling" expectations of their respective tasks. Coupled with the maturity of GPU-accelerated computing for training deep neural networks and the exponential growth of chemical data on which to train these networks on, we anticipate that deep learning algorithms will be a valuable tool for computational chemistry. △ Less

Submitted 16 January, 2017; originally announced January 2017.

arXiv:1606.07558 [pdf, ps, other]

Satisfying Real-world Goals with Dataset Constraints

Authors: Gabriel Goh, Andrew Cotter, Maya Gupta, Michael Friedlander

Abstract: The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a pr… ▽ More The goal of minimizing misclassification error on a training set is often just one of several real-world goals that might be defined on different datasets. For example, one may require a classifier to also make positive predictions at some specified rate for some subpopulation (fairness), or to achieve a specified empirical recall. Other real-world goals include reducing churn with respect to a previously deployed model, or stabilizing online training. In this paper we propose handling multiple goals on multiple datasets by training with dataset constraints, using the ramp penalty to accurately quantify costs, and present an efficient algorithm to approximately optimize the resulting non-convex constrained optimization problem. Experiments on both benchmark and real-world industry datasets demonstrate the effectiveness of our approach. △ Less

Submitted 3 May, 2017; v1 submitted 23 June, 2016; originally announced June 2016.

arXiv:1603.05719 [pdf, other]

Efficient evaluation of scaled proximal operators

Authors: Michael P. Friedlander, Gabriel Goh

Abstract: Quadratic-support functions [Aravkin, Burke, and Pillonetto; J. Mach. Learn. Res. 14(1), 2013] constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the m… ▽ More Quadratic-support functions [Aravkin, Burke, and Pillonetto; J. Mach. Learn. Res. 14(1), 2013] constitute a parametric family of convex functions that includes a range of useful regularization terms found in applications of convex optimization. We show how an interior method can be used to efficiently compute the proximal operator of a quadratic-support function under different metrics. When the metric and the function have the right structure, the proximal map can be computed with cost nearly linear in the input size. We describe how to use this approach to implement quasi-Newton methods for a rich class of nonsmooth problems that arise, for example, in sparse optimization, image denoising, and sparse logistic regression. △ Less

Submitted 19 December, 2016; v1 submitted 17 March, 2016; originally announced March 2016.

Comments: 23 pages

Journal ref: Electronic Transactions on Numerical Analysis, 46:1-22, 2017

arXiv:1401.3061 [pdf]

Easy Java Simulation, an innovative tool for teachers as designers of gravity-physics computer models

Authors: Loo Kang Wee, Giam Hwee Goh, Ee-Peow Lim

Abstract: This paper is on customization of computer models using the Easy Java Simulation authoring toolkit for the Singapore syllabus, based on real astronomical data, supported with literature reviewed researched pedagogical features. These 4 new computer models serves to support the enactment of scientific work that are inquiry centric and evidence based that are more likely to promote enjoyment and ins… ▽ More This paper is on customization of computer models using the Easy Java Simulation authoring toolkit for the Singapore syllabus, based on real astronomical data, supported with literature reviewed researched pedagogical features. These 4 new computer models serves to support the enactment of scientific work that are inquiry centric and evidence based that are more likely to promote enjoyment and inspire imagination having experienced gravity-physics than traditional pen and paper problem solving. Pilot research suggests students enactment of investigative learning like scientist is now possible, where gravity-physics comes alive. △ Less

Submitted 29 January, 2014; v1 submitted 13 January, 2014; originally announced January 2014.

Comments: 8 pages, 8 figures, MPTL18, 18th Multimedia in Physics Teaching and Learning Conference, MPTL18, Madrid, Spain Day 1: Parallel Session 1: Room PS1. Download simulations https://dl.dropboxusercontent.com/u/44365627/lookangEJSworkspace/export/ejs_model_GField_and_Potential_1D_v8wee.jar https://dl.dropboxusercontent.com/u/44365627/lookangEJSworkspace/export/ejs_model_GFieldandPotential1Dv7EarthMoon.jar https://dl.dropboxusercontent.com/u/44365627/lookangEJSworkspace/export/ejs_model_KeplerSystem3rdLaw09.jar https://dl.dropboxusercontent.com/u/44365627/lookangEJSS/export/ejs_model_EarthAndSatelite.jar

arXiv:1304.5586 [pdf, other]

Tail bounds for stochastic approximation

Authors: Michael P. Friedlander, Gabriel Goh

Abstract: Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iter… ▽ More Stochastic-approximation gradient methods are attractive for large-scale convex optimization because they offer inexpensive iterations. They are especially popular in data-fitting and machine-learning applications where the data arrives in a continuous stream, or it is necessary to minimize large sums of functions. It is known that by appropriately decreasing the variance of the error at each iteration, the expected rate of convergence matches that of the underlying deterministic gradient method. Conditions are given under which this happens with overwhelming probability. △ Less

Submitted 8 January, 2014; v1 submitted 20 April, 2013; originally announced April 2013.

arXiv:1303.0079 [pdf]

Enabling Gravity Physics by Inquiry using Easy Java Simulation

Authors: Loo Kang Wee, Giam Hwee Goh, Charles Chew

Abstract: Studying physics of very large scale like the solar system is difficult in real life, using telescope on clear skies over years. We are probably a world first to create four well designed gravity computer models to serve as powerful pedagogical tools for students active inquiry, based on real data. These models are syllabus customized, free and rapidly prototyped with Open Source Physics researche… ▽ More Studying physics of very large scale like the solar system is difficult in real life, using telescope on clear skies over years. We are probably a world first to create four well designed gravity computer models to serve as powerful pedagogical tools for students active inquiry, based on real data. These models are syllabus customized, free and rapidly prototyped with Open Source Physics researchers educators. Pilot study suggests students enactment of investigative learning like scientist is now possible, where gravity-physics comes alive. We are still continually improving the features of these computer models through feedback from students and teachers and the models can be downloaded from the internet. We hope more teachers will find the simulations useful in their own classes and further customized them so that others will find them more intelligible and contribute back to the wider educational fraternity to benefit all humankind. △ Less

Submitted 28 February, 2013; originally announced March 2013.

Comments: 6 pages, 12 figures, 5th redesign pedagogy conference

arXiv:1212.3863 [pdf]

doi 10.1088/0031-9120/48/1/72

Geostationary Earth Orbit Satellite Model using Easy Java Simulation

Authors: Loo Kang Wee, Giam Hwee Goh

Abstract: We develop an Easy Java Simulation (EJS) model for students to visualize geostationary orbits near Earth, modeled using Java 3D implementation of the EJS 3D library. The simplified physics model is described and simulated using simple constant angular velocity equation. Four computer model design ideas such as 1) simple and realistic 3D view and associated learning to real world, 2) comparative vi… ▽ More We develop an Easy Java Simulation (EJS) model for students to visualize geostationary orbits near Earth, modeled using Java 3D implementation of the EJS 3D library. The simplified physics model is described and simulated using simple constant angular velocity equation. Four computer model design ideas such as 1) simple and realistic 3D view and associated learning to real world, 2) comparative visualization of permanent geostationary satellite 3) examples of non-geostationary orbits of different 3-1) rotation sense, 3-2) periods, 3-3) planes and 4) incorrect physics model for conceptual discourse are discussed. General feedback from the students has been relatively positive, and we hope teachers will find the computer model useful in their own classes. 2015 Resources http://iwant2study.org/ospsg/index.php/interactive-resources/physics/02-newtonian-mechanics/08-gravity/62-gravity10 △ Less

Submitted 28 December, 2015; v1 submitted 16 December, 2012; originally announced December 2012.

Comments: 6 pages, 11 figures, 2013 Physics Education Volume 48 Number 1

Journal ref: Phys. Educ. 48 72 (2013)

arXiv:1210.3410 [pdf]

Computer Models Design for Teaching and Learning using Easy Java Simulation

Authors: Loo Kang Lawrence Wee, Ai Phing Lim, Khoon Song Aloysius Goh, Sze Yee LyeYE, Tat Leong Lee, Weiming Xu, Giam Hwee Jimmy Goh, Chee Wah Ong, Soo Kok Ng, Ee-Peow Lim, Chew Ling Lim, Wee Leng Joshua Yeo, Matthew Ong, Kenneth Y. T. LimI

Abstract: We are teachers who have benefited from the Open Source Physics (Brown, 2012; Christian, 2010; Esquembre, 2012) community's work and we would like to share some of the computer models and lesson packages that we have designed and implemented in five schools grade 11 to 12 classes. In a ground-up teacher-leadership (MOE, 2010) approach, we came together to learn, advancing the professionalism (MOE,… ▽ More We are teachers who have benefited from the Open Source Physics (Brown, 2012; Christian, 2010; Esquembre, 2012) community's work and we would like to share some of the computer models and lesson packages that we have designed and implemented in five schools grade 11 to 12 classes. In a ground-up teacher-leadership (MOE, 2010) approach, we came together to learn, advancing the professionalism (MOE, 2009) of physics educators and improve students' learning experiences through suitable blend (Jaakkola, 2012) of real equipment and computer models where appropriate . We will share computer models that we have remixed from existing library of computer models into suitable learning environments for inquiry of physics customized (Wee & Mak, 2009) for the Advanced Level Physics syllabus (SEAB, 2010, 2012). We hope other teachers would find these computer models useful and remix them to suit their own context, design better learning activities and share them to benefit all humankind, becoming citizens for the world. This is an eduLab (MOE, 2012b; Wee, 2010) project funded by the National Research Fund (NRF) Singapore and Ministry of Education (MOE) Singapore. △ Less

Submitted 24 October, 2013; v1 submitted 11 October, 2012; originally announced October 2012.

Comments: 10 pages with 12 pages appendix worksheet, 12 figures, The World Conference on Physics Education 1-6 July 2012 Oral Presentation [PS.02.09.a] Parallel Session 02.09|Date & Time: 02.07.2012 / 13:00 - 14:30|Hall: D403 (3rd Floor)

arXiv:1206.6489 [pdf]

doi 10.1088/0031-9120/47/4/448

Using Tracker as a Pedagogical Tool for Understanding Projectile Motion

Authors: Loo Kang Wee, Charles Chew, Giam Hwee Goh, Samuel Tan, Tat Leong Lee

Abstract: This paper reports the use of Tracker as a pedagogical tool in the effective learning and teaching of projectile motion in physics. When computer model building learning processes is supported and driven by video analysis data, this free Open Source Physics (OSP) tool can provide opportunities for students to engage in active inquiry-based learning. We discuss the pedagogical use of Tracker to add… ▽ More This paper reports the use of Tracker as a pedagogical tool in the effective learning and teaching of projectile motion in physics. When computer model building learning processes is supported and driven by video analysis data, this free Open Source Physics (OSP) tool can provide opportunities for students to engage in active inquiry-based learning. We discuss the pedagogical use of Tracker to address some common misconceptions of projectile motion by allowing students to test their hypothesis by juxtaposing their mental models against the analysis of real life videos. Initial research findings suggest that allowing learners to relate abstract physics concepts to real life through coupling computer modeling with traditional video analysis could be an innovative and effective way to learn projectile motion. 2015 Resources: http://iwant2study.org/ospsg/index.php/interactive-resources/physics/02-newtonian-mechanics/01-kinematics/174-projectile-motion △ Less

Submitted 23 December, 2015; v1 submitted 26 June, 2012; originally announced June 2012.

Comments: 9 pages, 9 figures; http://iopscience.iop.org/0031-9120/47/4/448

Showing 1–29 of 29 results for author: Goh, G