-
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Authors:
Yeonhong Park,
Jake Hyun,
SangLyul Cho,
Bonggeun Sim,
Jae W. Lee
Abstract:
Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance.…
▽ More
Recently, considerable efforts have been directed towards compressing Large Language Models (LLMs), which showcase groundbreaking capabilities across diverse applications but entail significant deployment costs due to their large sizes. Meanwhile, much less attention has been given to mitigating the costs associated with deploying multiple LLMs of varying sizes despite its practical significance. Thus, this paper introduces \emph{any-precision LLM}, extending the concept of any-precision DNN to LLMs. Addressing challenges in any-precision LLM, we propose a lightweight method for any-precision quantization of LLMs, leveraging a post-training quantization framework, and develop a specialized software engine for its efficient serving. As a result, our solution significantly reduces the high costs of deploying multiple, different-sized LLMs by overlaying LLMs quantized to varying bit-widths, such as 3, 4, ..., $n$ bits, into a memory footprint comparable to a single $n$-bit LLM. All the supported LLMs with varying bit-widths demonstrate state-of-the-art model quality and inference throughput, proving itself to be a compelling option for deployment of multiple, different-sized LLMs. Our code is open-sourced and available online.
△ Less
Submitted 21 June, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Magnitude and Angle Dynamics in Training Single ReLU Neurons
Authors:
Sangmin Lee,
Byeongsu Sim,
Jong Chul Ye
Abstract:
To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $φ(t):= π- θ(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describ…
▽ More
To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $φ(t):= π- θ(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.
△ Less
Submitted 11 October, 2022; v1 submitted 27 September, 2022;
originally announced September 2022.
-
The high-resolution in vivo measurement of replication fork velocity and pausing by lag-time analysis
Authors:
Dean Huang,
Anna E. Johnson,
Brandon S. Sim,
Teresa Lo,
Houra Merrikh,
Paul A. Wiggins
Abstract:
An important step towards understanding the mechanistic basis of the central dogma is the quantitative characterization of the dynamics of nucleic-acid-bound molecular motors in the context of the living cell, where a crowded cytoplasm as well as competing and potentially antagonistic processes may significantly affect their rapidity and reliability. To capture these dynamics, we develop a novel m…
▽ More
An important step towards understanding the mechanistic basis of the central dogma is the quantitative characterization of the dynamics of nucleic-acid-bound molecular motors in the context of the living cell, where a crowded cytoplasm as well as competing and potentially antagonistic processes may significantly affect their rapidity and reliability. To capture these dynamics, we develop a novel method, lag-time analysis, for measuring in vivo dynamics. The approach uses exponential growth as the stopwatch to resolve dynamics in an asynchronous culture and therefore circumvents the difficulties and potential artifacts associated with synchronization or fluorescent labeling. Although lag-time analysis has the potential to be widely applicable to the quantitative analysis of in vivo dynamics, we focus on an important application: characterizing replication dynamics. To benchmark the approach, we analyze replication dynamics in three different species and a collection of mutants. We provide the first quantitative locus-specific measurements of fork velocity, in units of kb per second, as well as replisome-pause durations, some with the precision of seconds. The measured fork velocity is observed to be both locus and time dependent, even in wild-type cells. In addition to quantitatively characterizing known phenomena, we detect brief, locus-specific pauses at rDNA in wild-type cells for the first time. We also observe temporal fork velocity oscillations in three highly-divergent bacterial species. Lag-time analysis not only has great potential to offer new insights into replication, as demonstrated in the paper, but also has potential to provide quantitative insights into other important processes.
△ Less
Submitted 6 September, 2022; v1 submitted 14 August, 2022;
originally announced August 2022.
-
Improving Diffusion Models for Inverse Problems using Manifold Constraints
Authors:
Hyung** Chung,
Byeongsu Sim,
Dohoon Ryu,
Jong Chul Ye
Abstract:
Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce suboptimal results. By studying the generative sampling path, here we show that current…
▽ More
Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce suboptimal results. By studying the generative sampling path, here we show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint, which can be used synergistically with the previous solvers to make the iterations close to the manifold. The proposed manifold constraint is straightforward to implement within a few lines of code, yet boosts the performance by a surprisingly large margin. With extensive experiments, we show that our method is superior to the previous methods both theoretically and empirically, producing promising results in many applications such as image inpainting, colorization, and sparse-view computed tomography. Code available https://github.com/HJ-harry/MCG_diffusion
△ Less
Submitted 20 May, 2024; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Transformer Network-based Reinforcement Learning Method for Power Distribution Network (PDN) Optimization of High Bandwidth Memory (HBM)
Authors:
Hyunwook Park,
Minsu Kim,
Seongguk Kim,
Keunwoo Kim,
Haeyeon Kim,
Taein Shin,
Keeyoung Son,
Boogyo Sim,
Subin Kim,
Seungtaek Jeong,
Chulsoon Hwang,
Joungho Kim
Abstract:
In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is…
▽ More
In this article, for the first time, we propose a transformer network-based reinforcement learning (RL) method for power distribution network (PDN) optimization of high bandwidth memory (HBM). The proposed method can provide an optimal decoupling capacitor (decap) design to maximize the reduction of PDN self- and transfer impedance seen at multiple ports. An attention-based transformer network is implemented to directly parameterize decap optimization policy. The optimality performance is significantly improved since the attention mechanism has powerful expression to explore massive combinatorial space for decap assignments. Moreover, it can capture sequential relationships between the decap assignments. The computing time for optimization is dramatically reduced due to the reusable network on positions of probing ports and decap assignment candidates. This is because the transformer network has a context embedding process to capture meta-features including probing ports positions. In addition, the network is trained with randomly generated data sets. Therefore, without additional training, the trained network can solve new decap optimization problems. The computing time for training and data cost are critically decreased due to the scalability of the network. Thanks to its shared weight property, the network can adapt to a larger scale of problems without additional training. For verification, we compare the results with conventional genetic algorithm (GA), random search (RS), and all the previous RL-based methods. As a result, the proposed method outperforms in all the following aspects: optimality performance, computing time, and data efficiency.
△ Less
Submitted 23 August, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Support Vectors and Gradient Dynamics of Single-Neuron ReLU Networks
Authors:
Sangmin Lee,
Byeongsu Sim,
Jong Chul Ye
Abstract:
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the g…
▽ More
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has been an important research topic in machine learning research. Unfortunately, even for a single ReLU neuron trained with the square loss, it was recently shown impossible to characterize the implicit regularization in terms of a norm of model parameters (Vardi & Shamir, 2021). In order to close the gap toward understanding intriguing generalization behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover an implicit bias in terms of support vectors, which plays a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show that the norm of the learned weight strictly increases through the gradient flow. Lastly, we prove the global convergence of single ReLU neuron for $d = 2$ case.
△ Less
Submitted 13 June, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction
Authors:
Hyung** Chung,
Byeongsu Sim,
Jong Chul Ye
Abstract:
Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from p…
▽ More
Diffusion models have recently attained significant interest within the community owing to their strong performance as generative models. Furthermore, its application to inverse problems have demonstrated state-of-the-art performance. Unfortunately, diffusion models have a critical downside - they are inherently slow to sample from, needing few thousand steps of iteration to generate images from pure Gaussian noise. In this work, we show that starting from Gaussian noise is unnecessary. Instead, starting from a single forward diffusion with better initialization significantly reduces the number of sampling steps in the reverse conditional diffusion. This phenomenon is formally explained by the contraction theory of the stochastic difference equations like our conditional diffusion strategy - the alternating applications of reverse diffusion followed by a non-expansive data consistency step. The new sampling strategy, dubbed Come-Closer-Diffuse-Faster (CCDF), also reveals a new insight on how the existing feed-forward neural network approaches for inverse problems can be synergistically combined with the diffusion models. Experimental results with super-resolution, image inpainting, and compressed sensing MRI demonstrate that our method can achieve state-of-the-art reconstruction performance at significantly reduced sampling steps.
△ Less
Submitted 19 March, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Unpaired Deep Learning for Accelerated MRI using Optimal Transport Driven CycleGAN
Authors:
Gyutaek Oh,
Byeongsu Sim,
Hyung** Chung,
Leonard Sunwoo,
Jong Chul Ye
Abstract:
Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k…
▽ More
Recently, deep learning approaches for accelerated MRI have been extensively studied thanks to their high performance reconstruction in spite of significantly reduced runtime complexity. These neural networks are usually trained in a supervised manner, so matched pairs of subsampled and fully sampled k-space data are required. Unfortunately, it is often difficult to acquire matched fully sampled k-space data, since the acquisition of fully sampled k-space data requires long scan time and often leads to the change of the acquisition protocol. Therefore, unpaired deep learning without matched label data has become a very important research topic. In this paper, we propose an unpaired deep learning approach using a optimal transport driven cycle-consistent generative adversarial network (OT-cycleGAN) that employs a single pair of generator and discriminator. The proposed OT-cycleGAN architecture is rigorously derived from a dual formulation of the optimal transport formulation using a specially designed penalized least squares cost. The experimental results show that our method can reconstruct high resolution MR images from accelerated k- space data from both single and multiple coil acquisition, without requiring matched reference data.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
A Comprehensive Model of the Degradation of Organic Light-Emitting Diodes and Application for Efficient Stable Blue Phosphorescent Devices with Reduced Influence of Polarons
Authors:
Bomi Sim,
Jong Soo Kim,
Hye** Bae,
Sungho Nam,
Eunsuk Kwon,
Ji Whan Kim,
Hwa-Young Cho,
Sunghan Kim,
Jang-Joo Kim
Abstract:
We present a comprehensive model to analyze, quantitatively, and predict the process of degradation of organic light-emitting diodes (OLEDs) considering all possible degradation mechanisms, i.e., polaron, exciton, exciton-polaron interactions, exciton-exciton interactions, and a newly proposed impurity effect. The loss of efficiency during degradation is presented as a function of quencher density…
▽ More
We present a comprehensive model to analyze, quantitatively, and predict the process of degradation of organic light-emitting diodes (OLEDs) considering all possible degradation mechanisms, i.e., polaron, exciton, exciton-polaron interactions, exciton-exciton interactions, and a newly proposed impurity effect. The loss of efficiency during degradation is presented as a function of quencher density, the density and generation mechanisms of which were extracted using a voltage rise model. The comprehensive model was applied to stable blue phosphorescent OLEDs (PhOLEDs), and the results showed that the model described the voltage rise and external quantum efficiency (EQE) loss very well, and that the quenchers in emitting layer (EML) were mainly generated by dopant polarons. Quencher formation was confirmed from a mass spectrometry. The polaron density per dopant molecule in EML was reduced by controlling the emitter do** ratio, resulting in the highest reported LT50 of 431 hours at an initial brightness of 500 cd/m2 with CIEy<0.25 and high external quantum efficiency (EQE) >18%.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Optimal Transport driven CycleGAN for Unsupervised Learning in Inverse Problems
Authors:
Byeongsu Sim,
Gyutaek Oh,
Jeongsol Kim,
Chanyong Jung,
Jong Chul Ye
Abstract:
To improve the performance of classical generative adversarial network (GAN), Wasserstein generative adversarial networks (W-GAN) was developed as a Kantorovich dual formulation of the optimal transport (OT) problem using Wasserstein-1 distance. However, it was not clear how cycleGAN-type generative models can be derived from the optimal transport theory. Here we show that a novel cycleGAN archite…
▽ More
To improve the performance of classical generative adversarial network (GAN), Wasserstein generative adversarial networks (W-GAN) was developed as a Kantorovich dual formulation of the optimal transport (OT) problem using Wasserstein-1 distance. However, it was not clear how cycleGAN-type generative models can be derived from the optimal transport theory. Here we show that a novel cycleGAN architecture can be derived as a Kantorovich dual OT formulation if a penalized least square (PLS) cost with deep learning-based inverse path penalty is used as a transportation cost. One of the most important advantages of this formulation is that depending on the knowledge of the forward problem, distinct variations of cycleGAN architecture can be derived: for example, one with two pairs of generators and discriminators, and the other with only a single pair of generator and discriminator. Even for the two generator cases, we show that the structural knowledge of the forward operator can lead to a simpler generator architecture which significantly simplifies the neural network training. The new cycleGAN formulation, what we call the OT-cycleGAN, have been applied for various biomedical imaging problems, such as accelerated magnetic resonance imaging (MRI), super-resolution microscopy, and low-dose x-ray computed tomography (CT). Experimental results confirm the efficacy and flexibility of the theory.
△ Less
Submitted 30 August, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
FATS: Feature Analysis for Time Series
Authors:
Isadora Nun,
Pavlos Protopapas,
Brandon Sim,
Ming Zhu,
Rahul Dave,
Nicolas Castro,
Karim Pichara
Abstract:
In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analy…
▽ More
In this paper, we present the FATS (Feature Analysis for Time Series) library. FATS is a Python library which facilitates and standardizes feature extraction for time series data. In particular, we focus on one application: feature extraction for astronomical light curve data, although the library is generalizable for other uses. We detail the methods and features implemented for light curve analysis, and present examples for its usage.
△ Less
Submitted 31 August, 2015; v1 submitted 29 May, 2015;
originally announced June 2015.