-
Simple and Effective Masked Diffusion Language Models
Authors:
Subham Sekhar Sahoo,
Marianne Arriola,
Yair Schiff,
Aaron Gokaslan,
Edgar Marroquin,
Justin T Chiu,
Alexander Rush,
Volodymyr Kuleshov
Abstract:
While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a sim…
▽ More
While diffusion models excel at generating high-quality images, prior work reports a significant performance gap between diffusion and autoregressive (AR) methods in language modeling. In this work, we show that simple masked discrete diffusion is more performant than previously thought. We apply an effective training recipe that improves the performance of masked diffusion models and derive a simplified, Rao-Blackwellized objective that results in additional improvements. Our objective has a simple form -- it is a mixture of classical masked language modeling losses -- and can be used to train encoder-only language models that admit efficient samplers, including ones that can generate arbitrary lengths of text semi-autoregressively like a traditional language model. On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art among diffusion models, and approaches AR perplexity. We release our code at: https://github.com/kuleshov-group/mdlm
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Diffusion Models With Learned Adaptive Noise
Authors:
Subham Sekhar Sahoo,
Aaron Gokaslan,
Chris De Sa,
Volodymyr Kuleshov
Abstract:
Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-l…
▽ More
Diffusion models have gained traction as powerful algorithms for synthesizing high-quality images. Central to these algorithms is the diffusion process, a set of equations which maps data to noise in a way that can significantly affect performance. In this paper, we explore whether the diffusion process can be learned from data. Our work is grounded in Bayesian inference and seeks to improve log-likelihood estimation by casting the learned diffusion process as an approximate variational posterior that yields a tighter lower bound (ELBO) on the likelihood. A widely held assumption is that the ELBO is invariant to the noise process: our work dispels this assumption and proposes multivariate learned adaptive noise (MULAN), a learned diffusion process that applies noise at different rates across an image. Specifically, our method relies on a multivariate noise schedule that is a function of the data to ensure that the ELBO is no longer invariant to the choice of the noise schedule as in previous works. Empirically, MULAN sets a new state-of-the-art in density estimation on CIFAR-10 and ImageNet and reduces the number of training steps by 50%. Code is available at https://github.com/s-sahoo/MuLAN
△ Less
Submitted 4 June, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
AxOMaP: Designing FPGA-based Approximate Arithmetic Operators using Mathematical Programming
Authors:
Siva Satyendra Sahoo,
Salim Ullah,
Akash Kumar
Abstract:
With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML in…
▽ More
With the increasing application of machine learning (ML) algorithms in embedded systems, there is a rising necessity to design low-cost computer arithmetic for these resource-constrained systems. As a result, emerging models of computation, such as approximate and stochastic computing, that leverage the inherent error-resilience of such algorithms are being actively explored for implementing ML inference on resource-constrained systems. Approximate computing (AxC) aims to provide disproportionate gains in the power, performance, and area (PPA) of an application by allowing some level of reduction in its behavioral accuracy (BEHAV). Using approximate operators (AxOs) for computer arithmetic forms one of the more prevalent methods of implementing AxC. AxOs provide the additional scope for finer granularity of optimization, compared to only precision scaling of computer arithmetic. To this end, designing platform-specific and cost-efficient approximate operators forms an important research goal. Recently, multiple works have reported using AI/ML-based approaches for synthesizing novel FPGA-based AxOs. However, most of such works limit usage of AI/ML to designing ML-based surrogate functions used during iterative optimization processes. To this end, we propose a novel data analysis-driven mathematical programming-based approach to synthesizing approximate operators for FPGAs. Specifically, we formulate mixed integer quadratically constrained programs based on the results of correlation analysis of the characterization data and use the solutions to enable a more directed search approach for evolutionary optimization algorithms. Compared to traditional evolutionary algorithms-based optimization, we report up to 21% improvement in the hypervolume, for joint optimization of PPA and BEHAV, in the design of signed 8-bit multipliers.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
AxOCS: Scaling FPGA-based Approximate Operators using Configuration Supersampling
Authors:
Siva Satyendra Sahoo,
Salim Ullah,
Soumyo Bhattacharjee,
Akash Kumar
Abstract:
The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine le…
▽ More
The rising usage of AI and ML-based processing across application domains has exacerbated the need for low-cost ML implementation, specifically for resource-constrained embedded systems. To this end, approximate computing, an approach that explores the power, performance, area (PPA), and behavioral accuracy (BEHAV) trade-offs, has emerged as a possible solution for implementing embedded machine learning. Due to the predominance of MAC operations in ML, designing platform-specific approximate arithmetic operators forms one of the major research problems in approximate computing. Recently there has been a rising usage of AI/ML-based design space exploration techniques for implementing approximate operators. However, most of these approaches are limited to using ML-based surrogate functions for predicting the PPA and BEHAV impact of a set of related design decisions. While this approach leverages the regression capabilities of ML methods, it does not exploit the more advanced approaches in ML. To this end, we propose AxOCS, a methodology for designing approximate arithmetic operators through ML-based supersampling. Specifically, we present a method to leverage the correlation of PPA and BEHAV metrics across operators of varying bit-widths for generating larger bit-width operators. The proposed approach involves traversing the relatively smaller design space of smaller bit-width operators and employing its associated Design-PPA-BEHAV relationship to generate initial solutions for metaheuristics-based optimization for larger operators. The experimental evaluation of AxOCS for FPGA-optimized approximate operators shows that the proposed approach significantly improves the quality-resulting hypervolume for multi-objective optimization-of 8x8 signed approximate multipliers.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Single-beam room-temperature atomic magnetometer with large bandwidth and dynamic range
Authors:
K. K. George Kurian,
Sushree S. Sahoo,
P. K. Madhu,
G. Rajalakshmi
Abstract:
We present a single-beam atomic magnetometer operating at room temperature for the measurement of ac magnetic fields. The magnetometer functions in the non-linear regime of magneto-optical rotation of $^{85}$Rb atomic vapour. We demonstrate a sensitivity of $\sim 0.9$ pT$/ \sqrt{Hz}$ at 2 kHz and a large bandwidth of 24 kHz. The dynamic range of measurement is $10^6$, making the sensor effective e…
▽ More
We present a single-beam atomic magnetometer operating at room temperature for the measurement of ac magnetic fields. The magnetometer functions in the non-linear regime of magneto-optical rotation of $^{85}$Rb atomic vapour. We demonstrate a sensitivity of $\sim 0.9$ pT$/ \sqrt{Hz}$ at 2 kHz and a large bandwidth of 24 kHz. The dynamic range of measurement is $10^6$, making the sensor effective even in Earth's field. We present the signal-to-noise and bandwidth characteristics of the system for both shielded and unshielded modes of operation. Moreover, we perform theoretical analysis for the atom-light system for the single laser beam configuration. The effect of light intensity and detuning on the magnetometer are studied theoretically as well as experimentally to understand the strengths and limitations of the technique.
△ Less
Submitted 9 September, 2022;
originally announced September 2022.
-
Semi-Autoregressive Energy Flows: Exploring Likelihood-Free Training of Normalizing Flows
Authors:
Phillip Si,
Zeyi Chen,
Subham Sekhar Sahoo,
Yair Schiff,
Volodymyr Kuleshov
Abstract:
Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not eas…
▽ More
Training normalizing flow generative models can be challenging due to the need to calculate computationally expensive determinants of Jacobians. This paper studies the likelihood-free training of flows and proposes the energy objective, an alternative sample-based loss based on proper scoring rules. The energy objective is determinant-free and supports flexible model architectures that are not easily compatible with maximum likelihood training, including semi-autoregressive energy flows, a novel model family that interpolates between fully autoregressive and non-autoregressive models. Energy flows feature competitive sample quality, posterior inference, and generation speed relative to likelihood-based flows; this performance is decorrelated from the quality of log-likelihood estimates, which are generally very poor. Our findings question the use of maximum likelihood as an objective or a metric, and contribute to a scientific study of its role in generative modeling.
△ Less
Submitted 22 June, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
Training Neural Networks using SAT solvers
Authors:
Subham S. Sahoo
Abstract:
We propose an algorithm to explore the global optimization method, using SAT solvers, for training a neural net. Deep Neural Networks have achieved great feats in tasks like-image recognition, speech recognition, etc. Much of their success can be attributed to the gradient-based optimisation methods, which scale well to huge datasets while still giving solutions, better than any other existing met…
▽ More
We propose an algorithm to explore the global optimization method, using SAT solvers, for training a neural net. Deep Neural Networks have achieved great feats in tasks like-image recognition, speech recognition, etc. Much of their success can be attributed to the gradient-based optimisation methods, which scale well to huge datasets while still giving solutions, better than any other existing methods. However, there exist learning problems like the parity function and the Fast Fourier Transform, where a neural network using gradient-based optimisation algorithm can not capture the underlying structure of the learning task properly. Thus, exploring global optimisation methods is of utmost interest as the gradient-based methods get stuck in local optima. In the experiments, we demonstrate the effectiveness of our algorithm against the ADAM optimiser in certain tasks like parity learning. However, in the case of image classification on the MNIST Dataset, the performance of our algorithm was less than satisfactory. We further discuss the role of the size of the training dataset and the hyper-parameter settings in kee** things scalable for a SAT solver.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Backpropagation through Combinatorial Algorithms: Identity with Projection Works
Authors:
Subham Sekhar Sahoo,
Anselm Paulus,
Marin Vlastelica,
Vít Musil,
Volodymyr Kuleshov,
Georg Martius
Abstract:
Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous…
▽ More
Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.
△ Less
Submitted 17 March, 2023; v1 submitted 30 May, 2022;
originally announced May 2022.
-
Nonlinear magnetoelectric effect in atomic vapor
Authors:
Sushree S. Sahoo,
Soumya R. Mishra,
G. Rajalakshmi,
Ashok K. Mohapatra
Abstract:
Magnetoelectric (ME) effect refers to the coupling between electric and magnetic fields in a medium resulting in electric polarization induced by magnetic fields and magnetization induced by electric fields. The linear ME effect in certain magnetoelectric materials such as multiferroics has been of great interest due to its application in the fabrication of spintronics devices, memories, and magne…
▽ More
Magnetoelectric (ME) effect refers to the coupling between electric and magnetic fields in a medium resulting in electric polarization induced by magnetic fields and magnetization induced by electric fields. The linear ME effect in certain magnetoelectric materials such as multiferroics has been of great interest due to its application in the fabrication of spintronics devices, memories, and magnetic sensors. However, the exclusive studies on the nonlinear ME effect are mostly centered on the investigation of second-harmonic generation in chiral materials. Here, we report the demonstration of nonlinear wave mixing of optical electric fields and radio-frequency (rf) magnetic fields in thermal atomic vapor, which is the consequence of the higher-order nonlinear ME effect in the medium. The experimental results are explained by comparing with density matrix calculations of the system. We also experimentally verify the expected dependence of the generated field amplitudes on the rf field magnitude as evidence of the magnetoelectric effect. This study can open up the possibility for precision rf-magnetometry due to its advantage in terms of larger dynamic range and arbitrary frequency resolution.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
ExPAN(N)D: Exploring Posits for Efficient Artificial Neural Network Design in FPGA-based Systems
Authors:
Suresh Nambi,
Salim Ullah,
Aditya Lohana,
Siva Satyendra Sahoo,
Farhad Merchant,
Akash Kumar
Abstract:
The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-a…
▽ More
The recent advances in machine learning, in general, and Artificial Neural Networks (ANN), in particular, has made smart embedded systems an attractive option for a larger number of application areas. However, the high computational complexity, memory footprints, and energy requirements of machine learning models hinder their deployment on resource-constrained embedded systems. Most state-of-the-art works have considered this problem by proposing various low bit-width data representation schemes, optimized arithmetic operators' implementations, and different complexity reduction techniques such as network pruning. To further elevate the implementation gains offered by these individual techniques, there is a need to cross-examine and combine these techniques' unique features. This paper presents ExPAN(N)D, a framework to analyze and ingather the efficacy of the Posit number representation scheme and the efficiency of fixed-point arithmetic implementations for ANNs. The Posit scheme offers a better dynamic range and higher precision for various applications than IEEE $754$ single-precision floating-point format. However, due to the dynamic nature of the various fields of the Posit scheme, the corresponding arithmetic circuits have higher critical path delay and resource requirements than the single-precision-based arithmetic units. Towards this end, we propose a novel Posit to fixed-point converter for enabling high-performance and energy-efficient hardware implementations for ANNs with minimal drop in the output accuracy. We also propose a modified Posit-based representation to store the trained parameters of a network. Compared to an $8$-bit fixed-point-based inference accelerator, our proposed implementation offers $\approx46\%$ and $\approx18\%$ reductions in the storage requirements of the parameters and energy consumption of the MAC units, respectively.
△ Less
Submitted 27 October, 2020; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Scaling Symbolic Methods using Gradients for Neural Model Explanation
Authors:
Subham Sekhar Sahoo,
Subhashini Venugopalan,
Li Li,
Rishabh Singh,
Patrick Riley
Abstract:
Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for mo…
▽ More
Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. Our approach uses gradient information (based on Integrated Gradients) to focus on a subset of neurons in the first layer, which allows our technique to scale to large networks. The corresponding SMT constraints encode the minimal input mask discovery problem such that after masking the input, the activations of the selected neurons are still above a threshold. After solving for the minimal masks, our approach scores the mask regions to generate a relative ordering of the features within the mask. This produces a saliency map which explains "where a model is looking" when making a prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone. Code and examples are at - https://github.com/google-research/google-research/tree/master/smug_saliency
△ Less
Submitted 5 May, 2021; v1 submitted 29 June, 2020;
originally announced June 2020.
-
New Competitive Analysis Results of Online List Scheduling Algorithm
Authors:
Rakesh Mohanty,
Debasis Dwibedy,
Shreeya Swagatika Sahoo
Abstract:
Online algorithm has been an emerging area of interest for researchers in various domains of computer science. The online $m$-machine list scheduling problem introduced by Graham has gained theoretical as well as practical significance in the development of competitive analysis as a performance measure for online algorithms. In this paper, we study and explore the performance of Graham's online \t…
▽ More
Online algorithm has been an emerging area of interest for researchers in various domains of computer science. The online $m$-machine list scheduling problem introduced by Graham has gained theoretical as well as practical significance in the development of competitive analysis as a performance measure for online algorithms. In this paper, we study and explore the performance of Graham's online \textit{list scheduling algorithm(LSA)} for independent jobs. In the literature, \textit{LSA} has already been proved to be $2-\frac{1}{m}$ competitive, where $m$ is the number of machines. We present two new upper bound results on competitive analysis of \textit{LSA}. We obtain upper bounds on the competitive ratio of $2-\frac{2}{m}$ and $2-\frac{m^2-m+1}{m^2}$ respectively for practically significant two special classes of input job sequences. Our analytical results can motivate the practitioners to design improved competitive online algorithms for the $m$-machine list scheduling problem by characterization of real life input sequences.
△ Less
Submitted 28 December, 2019;
originally announced January 2020.
-
Learning Equations for Extrapolation and Control
Authors:
Subham S. Sahoo,
Christoph H. Lampert,
Georg Martius
Abstract:
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to inclu…
▽ More
We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.
△ Less
Submitted 19 June, 2018;
originally announced June 2018.
-
Mirrorless optical parametric oscillator inside an all-optical waveguide
Authors:
Sushree S Sahoo,
Snigdha S Pati,
Ashok K mohapatra
Abstract:
Mirrorless optical parametric oscillator (MOPO) is a consequence of intrinsic feedback provided by the nonlinearity in a medium due to the interaction of a pair of strong counter-propagating fields. As the name suggests, the device doesn't require a cavity for lasing other than the nonlinear medium. Here, we report the first demonstration of MOPO under the effect of an all-optical waveguide. The e…
▽ More
Mirrorless optical parametric oscillator (MOPO) is a consequence of intrinsic feedback provided by the nonlinearity in a medium due to the interaction of a pair of strong counter-propagating fields. As the name suggests, the device doesn't require a cavity for lasing other than the nonlinear medium. Here, we report the first demonstration of MOPO under the effect of an all-optical waveguide. The efficient four-wave mixing process due to counter-propagating pump and control fields interacting with a multilevel atomic system facilitates the generation of mirrorless Stokes and anti-Stokes fields counter-propagating to each other. The maximum generated laser power could rise up to mW with pump conversion efficiency more than 30%. Furthermore, the cross-phase modulation due to the strong Gaussian beams create all-optical waveguides for the generated fields and hence induces different spatial modes in the Stokes as well as the anti-Stokes fields. With suitable experimental parameters, we could generate correlated Gaussian mode or Laguerre-Gaussian mode for both the generated fields.
△ Less
Submitted 13 April, 2018;
originally announced April 2018.
-
Study of optical nonlinearity of a highly dispersive medium using optical heterodyne detection technique
Authors:
Arup Bhowmick,
Sushree S. Sahoo,
Ashok K Mohapatra
Abstract:
We discuss the optical heterodyne detection technique to study the absorption and dispersion of a probe beam propagating through a medium with a narrow resonance. The technique has been demonstrated for Rydberg Electro-magnetically induced transparency (EIT) in rubidium thermal vapor and the optical non-linearity of a probe beam with variable intensity has been studied. A quantitative comparison o…
▽ More
We discuss the optical heterodyne detection technique to study the absorption and dispersion of a probe beam propagating through a medium with a narrow resonance. The technique has been demonstrated for Rydberg Electro-magnetically induced transparency (EIT) in rubidium thermal vapor and the optical non-linearity of a probe beam with variable intensity has been studied. A quantitative comparison of the experimental result with a suitable theoretical model is presented. The limitations and the working regime of the technique are discussed.
△ Less
Submitted 27 June, 2016;
originally announced June 2016.
-
Characterizations of GEM detector prototype
Authors:
Rajendra Nath Patra,
Amit Nanda,
Sharmili Rudra,
P. Bhattacharya,
Sumanya Sekhar Sahoo,
S. Biswas,
B. Mohanty,
T. K. Nayak,
P. K. Sahu,
S. Sahu
Abstract:
At NISER-IoP detector laboratory an initiative is taken to build and test Gas Electron Multiplier (GEM) detectors for ALICE experiment. The optimisation of the gas flow rate and the long-term stability test of the GEM detector are performed. The method and test results are presented.
At NISER-IoP detector laboratory an initiative is taken to build and test Gas Electron Multiplier (GEM) detectors for ALICE experiment. The optimisation of the gas flow rate and the long-term stability test of the GEM detector are performed. The method and test results are presented.
△ Less
Submitted 26 May, 2015;
originally announced May 2015.