Search | arXiv e-print repository

arXiv:2406.19328 [pdf, other]

Subtractive Training for Music Stem Insertion using Latent Diffusion Models

Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while kee** the remaining instruments the same. Lastly, we extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.10254 [pdf, other]

Towards Signal Processing In Large Language Models

Authors: Prateek Verma, Mert Pilanci

Abstract: This paper introduces the idea of applying signal processing inside a Large Language Model (LLM). With the recent explosion of generative AI, our work can help bridge two fields together, namely the field of signal processing and large language models. We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations for every intermediate a… ▽ More This paper introduces the idea of applying signal processing inside a Large Language Model (LLM). With the recent explosion of generative AI, our work can help bridge two fields together, namely the field of signal processing and large language models. We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations for every intermediate activation signal of an LLM. Once we decompose every activation signal across tokens into a time-frequency representation, we learn how to filter and reconstruct them, with all components learned from scratch, to predict the next token given the previous context. We show that for GPT-like architectures, our work achieves faster convergence and significantly increases performance by adding a minuscule number of extra parameters when trained for the same epochs. We hope this work paves the way for algorithms exploring signal processing inside the signals found in neural architectures like LLMs and beyond. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 12 pages, 3 figures

arXiv:2406.08904 [pdf, other]

AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers

Authors: Emil Biju, Anirudh Sriram, Mert Pilanci

Abstract: While large transformer-based models have exhibited remarkable performance in speaker-independent speech recognition, their large size and computational requirements make them expensive or impractical to use in resource-constrained settings. In this work, we propose a low-rank adaptive compression technique called AdaPTwin that jointly compresses product-dependent pairs of weight matrices in the t… ▽ More While large transformer-based models have exhibited remarkable performance in speaker-independent speech recognition, their large size and computational requirements make them expensive or impractical to use in resource-constrained settings. In this work, we propose a low-rank adaptive compression technique called AdaPTwin that jointly compresses product-dependent pairs of weight matrices in the transformer attention layer. Our approach can prioritize the compressed model's performance on a specific speaker while maintaining generalizability to new speakers and acoustic conditions. Notably, our technique requires only 8 hours of speech data for fine-tuning, which can be accomplished in under 20 minutes, making it highly cost-effective compared to other compression methods. We demonstrate the efficacy of our approach by compressing the Whisper and Distil-Whisper models by up to 45% while incurring less than a 2% increase in word error rate. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 12 pages, 3 figures, submitted to NeurIPS 2024

arXiv:2305.06482 [pdf, ps, other]

Coil Sketching for computationally-efficient MR iterative reconstruction

Authors: Julio A. Oscanoa, Frank Ong, Siddharth S. Iyer, Zhitao Li, Christopher M. Sandino, Batu Ozturkler, Daniel B. Ennis, Mert Pilanci, Shreyas S. Vasanawala

Abstract: Purpose: Parallel imaging and compressed sensing reconstructions of large MRI datasets often have a prohibitive computational cost that bottlenecks clinical deployment, especially for 3D non-Cartesian acquisitions. One common approach is to reduce the number of coil channels actively used during reconstruction as in coil compression. While effective for Cartesian imaging, coil compression inherent… ▽ More Purpose: Parallel imaging and compressed sensing reconstructions of large MRI datasets often have a prohibitive computational cost that bottlenecks clinical deployment, especially for 3D non-Cartesian acquisitions. One common approach is to reduce the number of coil channels actively used during reconstruction as in coil compression. While effective for Cartesian imaging, coil compression inherently loses signal energy, producing shading artifacts that compromise image quality for 3D non-Cartesian imaging. We propose coil sketching, a general and versatile method for computationally-efficient iterative MR image reconstruction. Theory and Methods: We based our method on randomized sketching algorithms, a type of large-scale optimization algorithms well established in the fields of machine learning and big data analysis. We adapt the sketching theory to the MRI reconstruction problem via a structured sketching matrix that, similar to coil compression, considers high-energy virtual coils obtained from principal component analysis. But, unlike coil compression, it also considers random linear combinations of the remaining low-energy coils, effectively leveraging information from all coils. Results: First, we performed ablation experiments to validate the sketching matrix design on both Cartesian and non-Cartesian datasets. The resulting design yielded both improved computational efficiency and preserved signal-to-noise ratio (SNR) as measured by the inverse g-factor. Then, we verified the efficacy of our approach on high-dimensional non-Cartesian 3D cones datasets, where coil sketching yielded up to three-fold faster reconstructions with equivalent image quality. Conclusion: Coil sketching is a general and versatile reconstruction framework for computationally fast and memory-efficient reconstruction. △ Less

Submitted 11 October, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 19 pages, 7 figures, 3 tables

arXiv:2302.13527 [pdf]

Complex Clip** for Improved Generalization in Machine Learning

Authors: Les Atlas, Nicholas Rasmussen, Felix Schwock, Mert Pilanci

Abstract: For many machine learning applications, a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform (STFT) which gives complex values. The spectrogram uses the magnitude of these complex values, a commonly used detector. Modern machine learning systems are commonly overparameterized, where possible ill-conditioning problems are… ▽ More For many machine learning applications, a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform (STFT) which gives complex values. The spectrogram uses the magnitude of these complex values, a commonly used detector. Modern machine learning systems are commonly overparameterized, where possible ill-conditioning problems are ameliorated by regularization. The common use of rectified linear unit (ReLU) activation functions between layers of a deep net has been shown to help this regularization, improving system performance. We extend this idea of ReLU activation to detection for the complex STFT, providing a simple-to-compute modified and regularized spectrogram, which potentially results in better behaved training. We then confirmed the benefit of this approach on a noisy acoustic data set used for a real-world application. Generalization performance improved substantially. This approach might benefit other applications which use time-frequency map**s, for acoustic, audio, and other applications. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Submitted to IEEE Signal Processing Letters

arXiv:2207.08393 [pdf, other]

GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction

Authors: Batu Ozturkler, Arda Sahiner, Tolga Ergen, Arjun D Desai, Christopher M Sandino, Shreyas Vasanawala, John M Pauly, Morteza Mardani, Mert Pilanci

Abstract: Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training… ▽ More Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training algorithms based on backpropagation due to prohibitively large memory and compute requirements for calculating gradients and storing intermediate activations. To address this challenge, we propose Greedy LEarning for Accelerated MRI (GLEAM) reconstruction, an efficient training strategy for high-dimensional imaging settings. GLEAM splits the end-to-end network into decoupled network modules. Each module is optimized in a greedy manner with decoupled gradient updates, reducing the memory footprint during training. We show that the decoupled gradient updates can be performed in parallel on multiple graphical processing units (GPUs) to further reduce training time. We present experiments with 2D and 3D datasets including multi-coil knee, brain, and dynamic cardiac cine MRI. We observe that: i) GLEAM generalizes as well as state-of-the-art memory-efficient baselines such as gradient checkpointing and invertible networks with the same memory footprint, but with 1.3x faster training; ii) for the same memory footprint, GLEAM yields 1.1dB PSNR gain in 2D and 1.8 dB in 3D over end-to-end baselines. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2204.10436 [pdf, other]

Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

Authors: Beliz Gunel, Arda Sahiner, Arjun D. Desai, Akshay S. Chaudhari, Shreyas Vasanawala, Mert Pilanci, John Pauly

Abstract: Unrolled neural networks have enabled state-of-the-art reconstruction performance and fast inference times for the accelerated magnetic resonance imaging (MRI) reconstruction task. However, these approaches depend on fully-sampled scans as ground truth data which is either costly or not possible to acquire in many clinical medical imaging applications; hence, reducing dependence on data is desirab… ▽ More Unrolled neural networks have enabled state-of-the-art reconstruction performance and fast inference times for the accelerated magnetic resonance imaging (MRI) reconstruction task. However, these approaches depend on fully-sampled scans as ground truth data which is either costly or not possible to acquire in many clinical medical imaging applications; hence, reducing dependence on data is desirable. In this work, we propose modeling the proximal operators of unrolled neural networks with scale-equivariant convolutional neural networks in order to improve the data-efficiency and robustness to drifts in scale of the images that might stem from the variability of patient anatomies or change in field-of-view across different MRI scanners. Our approach demonstrates strong improvements over the state-of-the-art unrolled neural networks under the same memory constraints both with and without data augmentations on both in-distribution and out-of-distribution scaled images without significantly increasing the train or inference time. △ Less

Submitted 21 April, 2022; originally announced April 2022.

arXiv:2202.11277 [pdf, other]

Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

Authors: Rajarshi Saha, Mert Pilanci, Andrea J. Goldsmith

Abstract: High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship… ▽ More High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship $\mathbf{y} = \mathbf{X}\boldsymbolθ + \mathbf{v}$. The learned model, which is an estimate of the latent parameter $\boldsymbolθ \in \mathbb{R}^d$, is constrained to be representable using only $Bd$ bits, where $B \in (0, \infty)$ is a pre-specified budget and $d$ is the dimension. We derive an information-theoretic lower bound for the minimax risk under this setting and propose a matching upper bound using randomized embedding-based algorithms which is tight up to constant factors. The lower and upper bounds together characterize the minimum threshold bit-budget required to achieve a performance risk comparable to the unquantized setting. We also propose randomized Hadamard embeddings that are computationally efficient and are optimal up to a mild logarithmic factor of the lower bound. Our model quantization strategy can be generalized and we show its efficacy by extending the method and upper-bounds to two-layer ReLU neural networks for non-linear regression. Numerical simulations show the improved performance of our proposed scheme as well as its closeness to the lower bound. △ Less

Submitted 30 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: 50 pages, 31 figures, 9 tables

arXiv:2201.08522 [pdf, other]

Orthonormal Sketches for Secure Coded Regression

Authors: Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III

Abstract: In this work, we propose a method for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample in \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In ou… ▽ More In this work, we propose a method for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample in \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate} gradient coding scheme, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. We focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be used to secure the data. We illustrate the performance through numerical experiments. △ Less

Submitted 22 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 3 figures, 5 pages excluding appendices

MSC Class: 65F10; 65F45; 68W15; 68W20; 68W25; 68P27; 68P30; ACM Class: E.3; E.4; G.1.2; G.1.3

arXiv:2201.01669 [pdf, other]

Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

Authors: Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, Mert Pilanci

Abstract: The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a soluti… ▽ More The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a solution utilizing cough sounds as a primary screening mode for COVID-19 diagnosis. This paper presents multiple models that have achieved relatively respectable performance on the largest evaluation dataset currently presented in academic literature. Through investigation of a self-supervised learning model (Area under the ROC curve, AUC = 0.807) and a convolutional nerual network (CNN) model (AUC = 0.802), we observe the possibility of model bias with limited datasets. Moreover, we observe that performance increases with training data size, showing the need for the worldwide collection of data to help combat the Covid-19 pandemic with non-traditional means. △ Less

Submitted 29 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

arXiv:2107.05680 [pdf, other]

Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, Burak Bartan, John Pauly, Morteza Mardani, Mert Pilanci

Abstract: Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GD… ▽ More Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GDA), but it is unclear whether the optimization problem contains any saddle points, or whether heuristic methods can find them in practice. In this work, we analyze the training of Wasserstein GANs with two-layer neural network discriminators through the lens of convex duality, and for a variety of generators expose the conditions under which Wasserstein GANs can be solved exactly with convex optimization approaches, or can be represented as convex-concave games. Using this convex duality interpretation, we further demonstrate the impact of different activation functions of the discriminator. Our observations are verified with numerical results demonstrating the power of the convex interpretation, with applications in progressive training of convex architectures corresponding to linear generators and quadratic-activation discriminators for CelebA image generation. The code for our experiments is available at https://github.com/ardasahiner/ProCoGAN. △ Less

Submitted 21 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: Published as paper in ICLR 2022. First two authors contributed equally to this work; 34 pages, 11 figures

arXiv:2012.05169 [pdf, other]

Convex Regularization Behind Neural Reconstruction

Authors: Arda Sahiner, Morteza Mardani, Batu Ozturkler, Mert Pilanci, John Pauly

Abstract: Neural networks have shown tremendous potential for reconstructing high-resolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical imaging. To cope with this challenge, this paper advocates a convex duality framework that makes a two-layer fully-convolutional ReLU denoising network amenable to… ▽ More Neural networks have shown tremendous potential for reconstructing high-resolution images in inverse problems. The non-convex and opaque nature of neural networks, however, hinders their utility in sensitive applications such as medical imaging. To cope with this challenge, this paper advocates a convex duality framework that makes a two-layer fully-convolutional ReLU denoising network amenable to convex optimization. The convex dual network not only offers the optimum training with convex solvers, but also facilitates interpreting training and prediction. In particular, it implies training neural networks with weight decay regularization induces path sparsity while the prediction is piecewise linear filtering. A range of experiments with MNIST and fastMRI datasets confirm the efficacy of the dual network optimization problem. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.13836 [pdf, other]

doi 10.1109/EMBC46164.2021.9630217

Linear Predictive Coding for Acute Stress Prediction from Computer Mouse Movements

Authors: Lawrence H. Kim, Rahul Goel, Jia Liang, Mert Pilanci, Pablo E. Paredes

Abstract: Prior work demonstrated the potential of using the Linear Predictive Coding (LPC) filter to approximate muscle stiffness and dam** from computer mouse movements to predict acute stress levels of users. Theoretically, muscle stiffness and dam** in the arm can be estimated using a mass-spring-damper (MSD) biomechanical model. However, the dam** frequency (i.e., stiffness) and dam** ratio val… ▽ More Prior work demonstrated the potential of using the Linear Predictive Coding (LPC) filter to approximate muscle stiffness and dam** from computer mouse movements to predict acute stress levels of users. Theoretically, muscle stiffness and dam** in the arm can be estimated using a mass-spring-damper (MSD) biomechanical model. However, the dam** frequency (i.e., stiffness) and dam** ratio values derived using LPC were not yet compared with those from a theoretical MSD model. This work demonstrates that the dam** frequency and dam** ratio from LPC are significantly correlated with those from an MSD model, thus confirming the validity of using LPC to infer muscle stiffness and dam**. We also compare the stress level binary classification performance using the values from LPC and MSD with each other and with neural network-based baselines. We found comparable performance across all conditions demonstrating LPC and MSD model-based stress prediction efficacy, especially for longer mouse trajectories. Clinical relevance: This work demonstrates the validity of the LPC filter to approximate muscle stiffness and dam** and predict acute stress from computer mouse movements. △ Less

Submitted 15 December, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: The first three authors contributed equally. 5 pages, 6 figures, 2 tables, published at EMBC'21

Journal ref: 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2002.10674 [pdf, other]

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

Authors: Elaina Chai, Mert Pilanci, Boris Murmann

Abstract: Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates ha… ▽ More Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory. △ Less

Submitted 1 June, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

Comments: Presented at Asilomar Conference on Signals, Systems, and Computers, 2020

Showing 1–14 of 14 results for author: Pilanci, M