Search | arXiv e-print repository

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Authors: Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu

Abstract: The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefi… ▽ More The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefilling often fail to maintain acceptable accuracy or efficiency when applied to long-context LLMs. To address this gap, we introduce MInference (Milliontokens Inference), a sparse calculation method designed to accelerate pre-filling of long-sequence processing. Specifically, we identify three unique patterns in long-context attention matrices-the A-shape, Vertical-Slash, and Block-Sparsethat can be leveraged for efficient sparse computation on GPUs. We determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. With the pattern and sparse indices, we perform efficient sparse attention calculations via our optimized GPU kernels to significantly reduce the latency in the pre-filling stage of long-context LLMs. Our proposed technique can be directly applied to existing LLMs without any modifications to the pre-training setup or additional fine-tuning. By evaluating on a wide range of downstream tasks, including InfiniteBench, RULER, PG-19, and Needle In A Haystack, and models including LLaMA-3-1M, GLM4-1M, Yi-200K, Phi-3-128K, and Qwen2-128K, we demonstrate that MInference effectively reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy. Our code is available at https://aka.ms/MInference. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2404.02933 [pdf, other]

NL2KQL: From Natural Language to Kusto Query

Authors: Amir H. Abdi, Xinye Tang, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing

Abstract: Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data ana… ▽ More Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: Schema Refiner which narrows down the schema to its most pertinent elements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Additionally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific database contexts. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics. Through ablation studies, the significance of each framework component is examined, and the datasets used for benchmarking are made publicly available. This work is the first of its kind and is compared with available baselines to demonstrate its effectiveness. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2212.03178 [pdf, ps, other]

doi 10.1016/j.compbiolchem.2023.107882

Longest Common Substring in Longest Common Subsequence's Solution Service: A Novel Hyper-Heuristic

Authors: Alireza Abdi, Masih Hajsaeedi, Mohsen Hooshmand

Abstract: The Longest Common Subsequence (LCS) is the problem of finding a subsequence among a set of strings that has two properties of being common to all and is the longest. The LCS has applications in computational biology and text editing, among many others. Due to the NP-hardness of the general longest common subsequence, numerous heuristic algorithms and solvers have been proposed to give the best po… ▽ More The Longest Common Subsequence (LCS) is the problem of finding a subsequence among a set of strings that has two properties of being common to all and is the longest. The LCS has applications in computational biology and text editing, among many others. Due to the NP-hardness of the general longest common subsequence, numerous heuristic algorithms and solvers have been proposed to give the best possible solution for different sets of strings. None of them has the best performance for all types of sets. In addition, there is no method to specify the type of a given set of strings. Besides that, the available hyper-heuristic is not efficient and fast enough to solve this problem in real-world applications. This paper proposes a novel hyper-heuristic to solve the longest common subsequence problem using a novel criterion to classify a set of strings based on their similarity. To do this, we offer a general stochastic framework to identify the type of a given set of strings. Following that, we introduce the set similarity dichotomizer ($S^2D$) algorithm based on the framework that divides the type of sets into two. This algorithm is introduced for the first time in this paper and opens a new way to go beyond the current LCS solvers. Then, we present a novel hyper-heuristic that exploits the $S^2D$ and one of the internal properties of the set to choose the best matching heuristic among a set of heuristics. We compare the results on benchmark datasets with the best heuristics and hyper-heuristics. The results show a higher performance of our proposed hyper-heuristic in both quality of solutions and run time factors. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2206.11726 [pdf, other]

Longest Common Subsequence: Tabular vs. Closed-Form Equation Computation of Subsequence Probability

Authors: Alireza Abdi, Mohsen Hooshmand

Abstract: The Longest Common Subsequence Problem (LCS) deals with finding the longest subsequence among a given set of strings. The LCS problem is an NP-hard problem which makes it a target for lots of effort to find a better solution with heuristics methods. The baseline for most famous heuristics functions is a tabular random, probabilistic approach. This approach approximates the length of the LCS in eac… ▽ More The Longest Common Subsequence Problem (LCS) deals with finding the longest subsequence among a given set of strings. The LCS problem is an NP-hard problem which makes it a target for lots of effort to find a better solution with heuristics methods. The baseline for most famous heuristics functions is a tabular random, probabilistic approach. This approach approximates the length of the LCS in each iteration. The combination of beam search and tabular probabilistic-based heuristics has led to a large number of proposals and achievements in algorithms for solving the LCS problem. In this work, we introduce a closed-form equation of the probabilistic table calculation for the first time. Moreover, we present other corresponding forms of the closed-form equation and prove all of them. The closed-form equation opens new ways for analysis and further approximations. Using the theorems and beam search, we propose an analytic method for estimating the length of the LCS of the remaining subsequence. Furthermore, we present another heuristic function based on the Coefficient of Variation. The results show that our proposed methods outperform the state-of-the-art methods on the LCS problem. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.09034 [pdf, other]

Towards Better Selective Classification

Authors: Leo Feng, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Amir Abdi

Abstract: We tackle the problem of Selective Classification where the objective is to achieve the best performance on a predetermined ratio (coverage) of the dataset. Recent state-of-the-art selective methods come with architectural changes either via introducing a separate selection head or an extra abstention logit. In this paper, we challenge the aforementioned methods. The results suggest that the super… ▽ More We tackle the problem of Selective Classification where the objective is to achieve the best performance on a predetermined ratio (coverage) of the dataset. Recent state-of-the-art selective methods come with architectural changes either via introducing a separate selection head or an extra abstention logit. In this paper, we challenge the aforementioned methods. The results suggest that the superior performance of state-of-the-art methods is owed to training a more generalizable classifier rather than their proposed selection mechanisms. We argue that the best performing selection mechanism should instead be rooted in the classifier itself. Our proposed selection strategy uses the classification scores and achieves better results by a significant margin, consistently, across all coverages and all datasets, without any added compute cost. Furthermore, inspired by semi-supervised learning, we propose an entropy-based regularizer that improves the performance of selective classification methods. Our proposed selection mechanism with the proposed entropy-based regularizer achieves new state-of-the-art results. △ Less

Submitted 1 March, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.04038 [pdf, other]

Scaleformer: Iterative Multi-scale Refining Transformers for Time Series Forecasting

Authors: Amin Shabani, Amir Abdi, Lili Meng, Tristan Sylvain

Abstract: The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing ar… ▽ More The performance of time series forecasting has recently been greatly improved by the introduction of transformers. In this paper, we propose a general multi-scale framework that can be applied to the state-of-the-art transformer-based time series forecasting models (FEDformer, Autoformer, etc.). By iteratively refining a forecasted time series at multiple scales with shared weights, introducing architecture adaptations, and a specially-designed normalization scheme, we are able to achieve significant performance improvements, from 5.5% to 38.5% across datasets and transformer architectures, with minimal additional computational overhead. Via detailed ablation studies, we demonstrate the effectiveness of each of our contributions across the architecture and methodology. Furthermore, our experiments on various public datasets demonstrate that the proposed improvements outperform their corresponding baseline counterparts. Our code is publicly available in https://github.com/BorealisAI/scaleformer. △ Less

Submitted 6 February, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: ICLR 2023

arXiv:2205.03454 [pdf, ps, other]

Structure Learning in Graphical Models from Indirect Observations

Authors: Hang Zhang, Afshin Abdi, Faramarz Fekri

Abstract: This paper considers learning of the graphical structure of a $p$-dimensional random vector $X \in R^p$ using both parametric and non-parametric methods. Unlike the previous works which observe $x$ directly, we consider the indirect observation scenario in which samples $y$ are collected via a sensing matrix $A \in R^{d\times p}$, and corrupted with some additive noise $w$, i.e, $Y = AX + W$. For… ▽ More This paper considers learning of the graphical structure of a $p$-dimensional random vector $X \in R^p$ using both parametric and non-parametric methods. Unlike the previous works which observe $x$ directly, we consider the indirect observation scenario in which samples $y$ are collected via a sensing matrix $A \in R^{d\times p}$, and corrupted with some additive noise $w$, i.e, $Y = AX + W$. For the parametric method, we assume $X$ to be Gaussian, i.e., $x\in R^p\sim N(μ, Σ)$ and $Σ\in R^{p\times p}$. For the first time, we show that the correct graphical structure can be correctly recovered under the indefinite sensing system ($d < p$) using insufficient samples ($n < p$). In particular, we show that for the exact recovery, we require dimension $d = Ω(p^{0.8})$ and sample number $n = Ω(p^{0.8}\log^3 p)$. For the nonparametric method, we assume a nonparanormal distribution for $X$ rather than Gaussian. Under mild conditions, we show that our graph-structure estimator can obtain the correct structure. We derive the minimum sample number $n$ and dimension $d$ as $n\gtrsim (deg)^4 \log^4 n$ and $d \gtrsim p + (deg\cdot\log(d-p))^{β/4}$, respectively, where deg is the maximum Markov blanket in the graphical model and $β> 0$ is some fixed positive constant. Additionally, we obtain a non-asymptotic uniform bound on the estimation error of the CDF of $X$ from indirect observations with inexact knowledge of the noise distribution. To the best of our knowledge, this bound is derived for the first time and may serve as an independent interest. Numerical experiments on both real-world and synthetic data are provided confirm the theoretical results. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2204.01262 [pdf, ps, other]

FT-EALU: Fault Tolerant Arithmetic and Logic Unit for Critical Embedded and Real time Systems

Authors: Athena Abdi, Sina Shahoveisi

Abstract: In this paper, a fault-tolerant approach to mitigate transient and permanent faults of arithmetic and logic operations of embedded processors called FT-EALU is proposed. In this method, each operation is replicated in time and the derived final results are voted to generate the final output. To consider the effect of permanent faults, replicating identical operations in time is not sufficient, and… ▽ More In this paper, a fault-tolerant approach to mitigate transient and permanent faults of arithmetic and logic operations of embedded processors called FT-EALU is proposed. In this method, each operation is replicated in time and the derived final results are voted to generate the final output. To consider the effect of permanent faults, replicating identical operations in time is not sufficient, and diversifying the operands is required. To this aim in FT-EALU, we consider three distinct versions of input data and apply the target operation to them serially in time. To avoid high time overhead, we employ simple operators such as shift and swap to make an appropriate diversion in input data. Our proposed fault tolerance approach passes the replicated and diverse results to a novel weighted voter that is designed based on the reward/punishment strategy. For each version of execution, based on the proposed weighting approach a corresponding weight according to its correction capability confronting several faulty scenarios is defined. This weight defines the reliability of the result of each version of execution and determines its effect on the final result. The final result is generated bit by bit based on the weight of each execution and its computed result. These weights are determined statically through a design-time learning scheme according to applying several types of faults on various data bits. Based on the capability of execution versions on mitigating the permanent faults, positive or negative scores are assigned to them. These scores are integrated for several cases and normalized to derive the appropriate weight of each execution at bit level. Several experiments are performed to show the efficiency of our proposed approach and based on them, FT-EALU is capable of correcting about 84.93% and 69.71% of permanent injected faults on single and double bits of input data. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2203.14717 [pdf, other]

doi 10.1109/TSUSC.2023.3244081

A novel evolutionary-based neuro-fuzzy task scheduling approach to jointly optimize the main design challenges of heterogeneous MPSoCs

Authors: Athena Abdi, Armin Salimi-Badr

Abstract: In this paper, an online task scheduling and map** method based on a fuzzy neural network (FNN) learned by an evolutionary multi-objective algorithm (NSGA-II) to jointly optimize the main design challenges of heterogeneous MPSoCs is proposed. In this approach, first, the FNN parameters are trained using an NSGA-II-based optimization engine by considering the main design challenges of MPSoCs incl… ▽ More In this paper, an online task scheduling and map** method based on a fuzzy neural network (FNN) learned by an evolutionary multi-objective algorithm (NSGA-II) to jointly optimize the main design challenges of heterogeneous MPSoCs is proposed. In this approach, first, the FNN parameters are trained using an NSGA-II-based optimization engine by considering the main design challenges of MPSoCs including temperature, power consumption, failure rate, and execution time on a training dataset consisting of different application graphs of various sizes. Next, the trained FNN is employed as an online task scheduler to jointly optimize the main design challenges in heterogeneous MPSoCs. Due to the uncertainty in sensor measurements and the difference between computational models and reality, applying the fuzzy neural network is advantageous in online scheduling procedures. The performance of the method is compared with some previous heuristic, meta-heuristic, and rule-based approaches in several experiments. Based on these experiments our proposed method outperforms the related studies in optimizing all design criteria. Its improvement over related heuristic and meta-heuristic approaches are estimated 10.58% in temperature, 9.22% in power consumption, 39.14% in failure rate, and 12.06% in execution time, averagely. Moreover, considering the interpretable nature of the FNN, the frequently fired extracted fuzzy rules of the proposed approach are demonstrated. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: in IEEE Transactions on Sustainable Computing

arXiv:2201.09483 [pdf, other]

A Machine Learning Framework for Distributed Functional Compression over Wireless Channels in IoT

Authors: Yashas Malur Saidutta, Afshin Abdi, Faramarz Fekri

Abstract: IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data… ▽ More IoT devices generating enormous data and state-of-the-art machine learning techniques together will revolutionize cyber-physical systems. In many diverse fields, from autonomous driving to augmented reality, distributed IoT devices compute specific target functions without simple forms like obstacle detection, object recognition, etc. Traditional cloud-based methods that focus on transferring data to a central location either for training or inference place enormous strain on network resources. To address this, we develop, to the best of our knowledge, the first machine learning framework for distributed functional compression over both the Gaussian Multiple Access Channel (GMAC) and orthogonal AWGN channels. Due to the Kolmogorov-Arnold representation theorem, our machine learning framework can, by design, compute any arbitrary function for the desired functional compression task in IoT. Importantly the raw sensory data are never transferred to a central node for training or inference, thus reducing communication. For these algorithms, we provide theoretical convergence guarantees and upper bounds on communication. Our simulations show that the learned encoders and decoders for functional compression perform significantly better than traditional approaches, are robust to channel condition changes and sensor outages. Compared to the cloud-based scenario, our algorithms reduce channel use by two orders of magnitude. △ Less

Submitted 30 April, 2023; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2106.10656 [pdf, other]

TD-GEN: Graph Generation With Tree Decomposition

Authors: Hamed Shirzad, Hossein Hajimirsadeghi, Amir H. Abdi, Greg Mori

Abstract: We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incr… ▽ More We propose TD-GEN, a graph generation framework based on tree decomposition, and introduce a reduced upper bound on the maximum number of decisions needed for graph generation. The framework includes a permutation invariant tree generation model which forms the backbone of graph generation. Tree nodes are supernodes, each representing a cluster of nodes in the graph. Graph nodes and edges are incrementally generated inside the clusters by traversing the tree supernodes, respecting the structure of the tree decomposition, and following node sharing decisions between the clusters. Finally, we discuss the shortcomings of standard evaluation criteria based on statistical properties of the generated graphs as performance measures. We propose to compare the performance of models based on likelihood. Empirical results on a variety of standard graph generation datasets demonstrate the superior performance of our method. △ Less

Submitted 23 February, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

arXiv:2011.10529 [pdf]

doi 10.1088/1478-3975/ab4345

Computation capacities of a broad class of signaling networks are higher than their communication capacities

Authors: Iman Habibi, Effat S Emamian, Osvaldo Simeone, Ali Abdi

Abstract: Due to structural and functional abnormalities or genetic variations and mutations, there may be dysfunctional molecules within an intracellular signaling network that do not allow the network to correctly regulate its output molecules, such as transcription factors. This disruption in signaling interrupts normal cellular functions and may eventually develop some pathological conditions. In this p… ▽ More Due to structural and functional abnormalities or genetic variations and mutations, there may be dysfunctional molecules within an intracellular signaling network that do not allow the network to correctly regulate its output molecules, such as transcription factors. This disruption in signaling interrupts normal cellular functions and may eventually develop some pathological conditions. In this paper, computation capacity of signaling networks is introduced as a fundamental limit on signaling capability and performance of such networks. The computation capacity measures the maximum number of computable inputs, that is, the maximum number of input values for which the correct functional output values can be recovered from the erroneous network outputs, when the network contains some dysfunctional molecules. This contrasts with the conventional communication capacity that measures instead the maximum number of input values that can be correctly distinguished based on the erroneous network outputs. The computation capacity is higher than the communication capacity, if the network response function is not a one-to-one function of the input signals. By explicitly incorporating the effect of signaling errors that result in the network dysfunction, the computation capacity provides more information about the network and its malfunction. Two examples of signaling networks are studied here, one regulating caspase3 and another regulating NFkB, for which computation and communication capacities are analyzed. Higher computation capacities are observed for both networks. One biological implication of this finding is that signaling networks may have more capacity than that specified by the conventional communication capacity metric. The effect of feedback is also studied. In summary, this paper reports findings on a new fundamental feature of the signaling capability of cell signaling networks. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: 51 pages, 8 figures

Journal ref: Phys. Biol. 16 064001 (2019)

arXiv:2008.08289 [pdf, other]

Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference

Authors: Afshin Abdi, Saeed Rashidi, Faramarz Fekri, Tushar Krishna

Abstract: Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel… ▽ More Using multiple nodes and parallel computing algorithms has become a principal tool to improve training and execution times of deep neural networks as well as effective collective intelligence in sensor networks. In this paper, we consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) where the deep model is divided into several parallel sub-models, each of which is executed by a worker. Since latency due to synchronization and data transfer among workers negatively impacts the performance of the parallel implementation, it is desirable to have minimum interdependency among parallel sub-models. To achieve this goal, we propose to rearrange the neurons in the neural network and partition them (without changing the general topology of the neural network), such that the interdependency among sub-models is minimized under the computations and communications constraints of the workers. We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. To efficiently apply RePurpose, we propose an approach based on $\ell_0$ optimization and the Munkres assignment algorithm. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation, both in terms of communication and computational complexity. △ Less

Submitted 19 August, 2020; originally announced August 2020.

arXiv:1912.05184 [pdf, other]

Variational Learning with Disentanglement-PyTorch

Authors: Amir H. Abdi, Purang Abolmaesumi, Sidney Fels

Abstract: Unsupervised learning of disentangled representations is an open problem in machine learning. The Disentanglement-PyTorch library is developed to facilitate research, implementation, and testing of new variational algorithms. In this modular library, neural architectures, dimensionality of the latent space, and the training algorithms are fully decoupled, allowing for independent and consistent ex… ▽ More Unsupervised learning of disentangled representations is an open problem in machine learning. The Disentanglement-PyTorch library is developed to facilitate research, implementation, and testing of new variational algorithms. In this modular library, neural architectures, dimensionality of the latent space, and the training algorithms are fully decoupled, allowing for independent and consistent experiments across variational methods. The library handles the training scheduling, logging, and visualizations of reconstructions and latent space traversals. It also evaluates the encodings based on various disentanglement metrics. The library, so far, includes implementations of the following unsupervised algorithms VAE, Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE, as well as conditional approaches such as CVAE and IFCVAE. The library is compatible with the Disentanglement Challenge of NeurIPS 2019, hosted on AICrowd, and achieved the 3rd rank in both the first and second stages of the challenge. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Disentanglement Challenge - 33rd Conference on Neural Information Processing Systems (NeurIPS) - NeurIPS 2019

arXiv:1912.03120 [pdf, other]

A Study into Echocardiography View Conversion

Authors: Amir H. Abdi, Mohammad H. Jafari, Sidney Fels, Theresa Tsang, Purang Abolmaesumi

Abstract: Transthoracic echo is one of the most common means of cardiac studies in the clinical routines. During the echo exam, the sonographer captures a set of standard cross sections (echo views) of the heart. Each 2D echo view cuts through the 3D cardiac geometry via a unique plane. Consequently, different views share some limited information. In this work, we investigate the feasibility of generating a… ▽ More Transthoracic echo is one of the most common means of cardiac studies in the clinical routines. During the echo exam, the sonographer captures a set of standard cross sections (echo views) of the heart. Each 2D echo view cuts through the 3D cardiac geometry via a unique plane. Consequently, different views share some limited information. In this work, we investigate the feasibility of generating a 2D echo view using another view based on adversarial generative models. The objective optimized to train the view-conversion model is based on the ideas introduced by LSGAN, PatchGAN and Conditional GAN (cGAN). The size and length of the left ventricle in the generated target echo view is compared against that of the target ground-truth to assess the validity of the echo view conversion. Results show that there is a correlation of 0.50 between the LV areas and 0.49 between the LV lengths of the generated target frames and the real target frames. △ Less

Submitted 5 December, 2019; originally announced December 2019.

Comments: Workshop of Medical Imaging Meets NeurIPS, NeurIPS 2019

arXiv:1912.00614 [pdf, other]

Idealness of $k$-wise intersecting families

Authors: Ahmad Abdi, Gérard Cornuéjols, Tony Huynh, Dabeen Lee

Abstract: A clutter is \emph{$k$-wise intersecting} if every $k$ members have a common element, yet no element belongs to all members. We conjecture that, for some integer $k\geq 4$, every $k$-wise intersecting clutter is non-ideal. As evidence for our conjecture, we prove it for $k=4$ for the class of binary clutters. Two key ingredients for our proof are Jaeger's $8$-flow theorem for graphs, and Seymour's… ▽ More A clutter is \emph{$k$-wise intersecting} if every $k$ members have a common element, yet no element belongs to all members. We conjecture that, for some integer $k\geq 4$, every $k$-wise intersecting clutter is non-ideal. As evidence for our conjecture, we prove it for $k=4$ for the class of binary clutters. Two key ingredients for our proof are Jaeger's $8$-flow theorem for graphs, and Seymour's characterization of the binary matroids with the sums of circuits property. As further evidence for our conjecture, we also note that it follows from an unpublished conjecture of Seymour from 1975. We also discuss connections to the chromatic number of a clutter, projective geometries over the two-element field, uniform cycle covers in graphs, and quarter-integral packings of value two in ideal clutters. △ Less

Submitted 3 October, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

Comments: 20 pages, 2 figures. An extended abstract under the same title appeared in the 21st Conference in Integer Programming and Combinatorial Optimization

MSC Class: 90C10; 90C27; 05C21; 52B40

arXiv:1911.11791 [pdf, other]

A Preliminary Study of Disentanglement With Insights on the Inadequacy of Metrics

Authors: Amir H. Abdi, Purang Abolmaesumi, Sidney Fels

Abstract: Disentangled encoding is an important step towards a better representation learning. However, despite the numerous efforts, there still is no clear winner that captures the independent features of the data in an unsupervised fashion. In this work we empirically evaluate the performance of six unsupervised disentanglement approaches on the mpi3d toy dataset curated and released for the NeurIPS 2019… ▽ More Disentangled encoding is an important step towards a better representation learning. However, despite the numerous efforts, there still is no clear winner that captures the independent features of the data in an unsupervised fashion. In this work we empirically evaluate the performance of six unsupervised disentanglement approaches on the mpi3d toy dataset curated and released for the NeurIPS 2019 Disentanglement Challenge. The methods investigated in this work are Beta-VAE, Factor-VAE, DIP-I-VAE, DIP-II-VAE, Info-VAE, and Beta-TCVAE. The capacities of all models were progressively increased throughout the training and the hyper-parameters were kept intact across experiments. The methods were evaluated based on five disentanglement metrics, namely, DCI, Factor-VAE, IRS, MIG, and SAP-Score. Within the limitations of this study, the Beta-TCVAE approach was found to outperform its alternatives with respect to the normalized sum of metrics. However, a qualitative study of the encoded latents reveal that there is not a consistent correlation between the reported metrics and the disentanglement potential of the model. △ Less

Submitted 26 November, 2019; originally announced November 2019.

Comments: Disentanglement Challenge - NeurIPS 2019

arXiv:1911.02121 [pdf, other]

GAN-enhanced Conditional Echocardiogram Generation

Authors: Amir H. Abdi, Teresa Tsang, Purang Abolmaesumi

Abstract: Echocardiography (echo) is a common means of evaluating cardiac conditions. Due to the label scarcity, semi-supervised paradigms in automated echo analysis are getting traction. One of the most sought-after problems in echo is the segmentation of cardiac structures (e.g. chambers). Accordingly, we propose an echocardiogram generation approach using generative adversarial networks with a conditiona… ▽ More Echocardiography (echo) is a common means of evaluating cardiac conditions. Due to the label scarcity, semi-supervised paradigms in automated echo analysis are getting traction. One of the most sought-after problems in echo is the segmentation of cardiac structures (e.g. chambers). Accordingly, we propose an echocardiogram generation approach using generative adversarial networks with a conditional patch-based discriminator. In this work, we validate the feasibility of GAN-enhanced echo generation with different conditions (segmentation masks), namely, the left ventricle, ventricular myocardium, and atrium. Results show that the proposed adversarial algorithm can generate high-quality echo frames whose cardiac structures match the given segmentation masks. This method is expected to facilitate the training of other machine learning models in a semi-supervised fashion as suggested in similar researches. △ Less

Submitted 23 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

Comments: Workshop of Medical Imaging Meets NeurIPS, NeurIPS 2019

arXiv:1911.00674 [pdf, other]

On Modelling Label Uncertainty in Deep Neural Networks: Automatic Estimation of Intra-observer Variability in 2D Echocardiography Quality Assessment

Authors: Zhibin Liao, Hany Girgis, Amir Abdi, Hooman Vaseli, Jorden Hetherington, Robert Rohling, Ken Gin, Teresa Tsang, Purang Abolmaesumi

Abstract: Uncertainty of labels in clinical data resulting from intra-observer variability can have direct impact on the reliability of assessments made by deep neural networks. In this paper, we propose a method for modelling such uncertainty in the context of 2D echocardiography (echo), which is a routine procedure for detecting cardiovascular disease at point-of-care. Echo imaging quality and acquisition… ▽ More Uncertainty of labels in clinical data resulting from intra-observer variability can have direct impact on the reliability of assessments made by deep neural networks. In this paper, we propose a method for modelling such uncertainty in the context of 2D echocardiography (echo), which is a routine procedure for detecting cardiovascular disease at point-of-care. Echo imaging quality and acquisition time is highly dependent on the operator's experience level. Recent developments have shown the possibility of automating echo image quality quantification by map** an expert's assessment of quality to the echo image via deep learning techniques. Nevertheless, the observer variability in the expert's assessment can impact the quality quantification accuracy. Here, we aim to model the intra-observer variability in echo quality assessment as an aleatoric uncertainty modelling regression problem with the introduction of a novel method that handles the regression problem with categorical labels. A key feature of our design is that only a single forward pass is sufficient to estimate the level of uncertainty for the network output. Compared to the $0.11 \pm 0.09$ absolute error (in a scale from 0 to 1) archived by the conventional regression method, the proposed method brings the error down to $0.09 \pm 0.08$, where the improvement is statistically significant and equivalents to $5.7\%$ test accuracy improvement. The simplicity of the proposed approach means that it could be generalized to other applications of deep learning in medical imaging, where there is often uncertainty in clinical labels. △ Less

Submitted 2 November, 2019; originally announced November 2019.

arXiv:1906.11957 [pdf, other]

doi 10.1007/978-3-030-32254-0\_26

Variational Shape Completion for Virtual Planning of Jaw Reconstructive Surgery

Authors: Amir H. Abdi, Mehran Pesteie, Eitan Prisman, Purang Abolmaesumi, Sidney Fels

Abstract: The premorbid geometry of the mandible is of significant relevance in jaw reconstructive surgeries and occasionally unknown to the surgical team. In this paper, an optimization framework is introduced to train deep models for completion (reconstruction) of the missing segments of the bone based on the remaining healthy structure. To leverage the contextual information of the surroundings of the di… ▽ More The premorbid geometry of the mandible is of significant relevance in jaw reconstructive surgeries and occasionally unknown to the surgical team. In this paper, an optimization framework is introduced to train deep models for completion (reconstruction) of the missing segments of the bone based on the remaining healthy structure. To leverage the contextual information of the surroundings of the dissected region, the voxel-weighted Dice loss is introduced. To address the non-deterministic nature of the shape completion problem, we leverage a weighted multi-target probabilistic solution which is an extension to the conditional variational autoencoder (CVAE). This approach considers multiple targets as acceptable reconstructions, each weighted according to their conformity with the original shape. We quantify the performance gain of the proposed method against similar algorithms, including CVAE, where we report statistically significant improvements in both deterministic and probabilistic paradigms. The probabilistic model is also evaluated on its ability to generate anatomically relevant variations for the missing bone. As a unique aspect of this work, the model is tested on real surgical cases where the clinical relevancy of its reconstructions and their compliance with surgeon's virtual plan are demonstrated as necessary steps towards clinical adoption. △ Less

Submitted 15 July, 2019; v1 submitted 27 June, 2019; originally announced June 2019.

Comments: Proceedings of Medical Image Computing and Computer Assisted Intervention - {MICCAI} 2019

arXiv:1905.03567 [pdf, other]

Stochastic Fading Channel Models with Multiple Dominant Specular Components for 5G and Beyond

Authors: Juan M. Romero-Jerez, F. Javier Lopez-Martinez, Juan P. Peña-Martin, Ali Abdi

Abstract: We introduce a comprehensive statistical characterization of the multipath wireless channel built as a superposition of a number of scattered waves with random phases. We consider an arbitrary number $N$ of specular (dominant) components plus other diffusely propagating waves. Our approach covers the cases on which the specular components have constant amplitudes, as well as when these components… ▽ More We introduce a comprehensive statistical characterization of the multipath wireless channel built as a superposition of a number of scattered waves with random phases. We consider an arbitrary number $N$ of specular (dominant) components plus other diffusely propagating waves. Our approach covers the cases on which the specular components have constant amplitudes, as well as when these components experience random fluctuations. These propagation scenarios are found in different use cases of 5G networks, as well as in the context of large intelligent surface based communications. We show that this class of fading models can be expressed in terms of a continuous mixture of an underlying Rician (or Rician shadowed) fading model, averaged over the phase distributions of the specular waves. It is shown that the fluctuations of the specular components have a detrimental impact on performance, and the best performance is obtained when there is only one specular component. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: This work has been submitted to the IEEE for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:1904.01197 [pdf, other]

Nested Dithered Quantization for Communication Reduction in Distributed Training

Authors: Afshin Abdi, Faramarz Fekri

Abstract: In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose \emph{dithered quantization} for the transmission of the stochastic gradients and show that training with \emph{Dithered Quantized Stochastic Gradients (DQSG)} is similar to the… ▽ More In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose \emph{dithered quantization} for the transmission of the stochastic gradients and show that training with \emph{Dithered Quantized Stochastic Gradients (DQSG)} is similar to the training with unquantized SGs perturbed by an independent bounded uniform noise, in contrast to the other quantization methods where the perturbation depends on the gradients and hence, complicating the convergence analysis. We study the convergence of training algorithms using DQSG and the trade off between the number of quantization levels and the training time. Next, we observe that there is a correlation among the SGs computed by workers that can be utilized to further reduce the communication overhead without any performance loss. Hence, we develop a simple yet effective quantization scheme, nested dithered quantized SG (NDQSG), that can reduce the communication significantly \emph{without requiring the workers communicating extra information to each other}. We prove that although NDQSG requires significantly less bits, it can achieve the same quantization variance bound as DQSG. Our simulation results confirm the effectiveness of training using DQSG and NDQSG in reducing the communication bits or the convergence time compared to the existing methods without sacrificing the accuracy of the trained model. △ Less

Submitted 1 April, 2019; originally announced April 2019.

arXiv:1809.06121 [pdf, other]

doi 10.1007/978-3-030-15923-8_11

Muscle Excitation Estimation in Biomechanical Simulation Using NAF Reinforcement Learning

Authors: Amir H. Abdi, Pramit Saha, Praneeth Srungarapu, Sidney Fels

Abstract: Motor control is a set of time-varying muscle excitations which generate desired motions for a biomechanical system. Muscle excitations cannot be directly measured from live subjects. An alternative approach is to estimate muscle activations using inverse motion-driven simulation. In this article, we propose a deep reinforcement learning method to estimate the muscle excitations in simulated biome… ▽ More Motor control is a set of time-varying muscle excitations which generate desired motions for a biomechanical system. Muscle excitations cannot be directly measured from live subjects. An alternative approach is to estimate muscle activations using inverse motion-driven simulation. In this article, we propose a deep reinforcement learning method to estimate the muscle excitations in simulated biomechanical systems. Here, we introduce a custom-made reward function which incentivizes faster point-to-point tracking of target motion. Moreover, we deploy two new techniques, namely, episode-based hard update and dual buffer experience replay, to avoid feedback training loops. The proposed method is tested in four simulated 2D and 3D environments with 6 to 24 axial muscles. The results show that the models were able to learn muscle excitations for given motions after nearly 100,000 simulated steps. Moreover, the root mean square error in point-to-point reaching of the target across experiments was less than 1% of the length of the domain of motion. Our reinforcement learning method is far from the conventional dynamic approaches as the muscle control is derived functionally by a set of distributed neurons. This can open paths for neural activity interpretation of this phenomenon. △ Less

Submitted 3 May, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

Comments: 9 pages, 3 figures. Computational Biomechanics for Medicine. MICCAI 2019. Springer, Cham

arXiv:1806.06457 [pdf, other]

Fast Convex Pruning of Deep Neural Networks

Authors: Alireza Aghasi, Afshin Abdi, Justin Romberg

Abstract: We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimizati… ▽ More We develop a fast, tractable technique called Net-Trim for simplifying a trained neural network. The method is a convex post-processing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of Net-Trim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after Net-Trim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can re-create the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recently-developed tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of Net-Trim for network pruning and compression. △ Less

Submitted 25 February, 2019; v1 submitted 17 June, 2018; originally announced June 2018.

arXiv:1611.05162 [pdf, other]

Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee

Authors: Alireza Aghasi, Afshin Abdi, Nam Nguyen, Justin Romberg

Abstract: We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex op… ▽ More We introduce and analyze a new technique for model reduction for deep neural networks. While large networks are theoretically capable of learning arbitrarily complex models, overfitting and model redundancy negatively affects the prediction accuracy and model variance. Our Net-Trim algorithm prunes (sparsifies) a trained network layer-wise, removing connections at each layer by solving a convex optimization program. This program seeks a sparse set of weights at each layer that keeps the layer inputs and outputs consistent with the originally trained model. The algorithms and associated analysis are applicable to neural networks operating with the rectified linear unit (ReLU) as the nonlinear activation. We present both parallel and cascade versions of the algorithm. While the latter can achieve slightly simpler models with the same generalization performance, the former can be computed in a distributed manner. In both cases, Net-Trim significantly reduces the number of connections in the network, while also providing enough regularization to slightly reduce the generalization error. We also provide a mathematical analysis of the consistency between the initial network and the retrained model. To analyze the model sample complexity, we derive the general sufficient conditions for the recovery of a sparse transform matrix. For a single layer taking independent Gaussian random vectors of length $N$ as inputs, we show that if the network response can be described using a maximum number of $s$ non-zero weights per node, these weights can be learned from $\mathcal{O}(s\log N)$ samples. △ Less

Submitted 23 November, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

arXiv:1502.03578 [pdf, ps, other]

Lower Bounds for Cover-Free Families

Authors: Ali Z. Abdi, Nader H. Bshouty

Abstract: Let ${\cal F}$ be a set of blocks of a $t$-set $X$. $(X,{\cal F})$ is called $(w,r)$-cover-free family ($(w,r)-$CFF) provided that, the intersection of any $w$ blocks in ${\cal F}$ is not contained in the union of any other $r$ blocks in ${\cal F}$. We give new asymptotic lower bounds for the number of minimum points $t$ in a $(w,r)$-CFF when $w\le r=|{\cal F}|^ε$ for some constant $ε\ge 1/2$. Let ${\cal F}$ be a set of blocks of a $t$-set $X$. $(X,{\cal F})$ is called $(w,r)$-cover-free family ($(w,r)-$CFF) provided that, the intersection of any $w$ blocks in ${\cal F}$ is not contained in the union of any other $r$ blocks in ${\cal F}$. We give new asymptotic lower bounds for the number of minimum points $t$ in a $(w,r)$-CFF when $w\le r=|{\cal F}|^ε$ for some constant $ε\ge 1/2$. △ Less

Submitted 31 March, 2015; v1 submitted 12 February, 2015; originally announced February 2015.

arXiv:1405.1535 [pdf, ps, other]

Learning Boolean Halfspaces with Small Weights from Membership Queries

Authors: Hasan Abasi, Ali Z. Abdi, Nader H. Bshouty

Abstract: We consider the problem of proper learning a Boolean Halfspace with integer weights $\{0,1,\ldots,t\}$ from membership queries only. The best known algorithm for this problem is an adaptive algorithm that asks $n^{O(t^5)}$ membership queries where the best lower bound for the number of membership queries is $n^t$ [Learning Threshold Functions with Small Weights Using Membership Queries. COLT 1999]… ▽ More We consider the problem of proper learning a Boolean Halfspace with integer weights $\{0,1,\ldots,t\}$ from membership queries only. The best known algorithm for this problem is an adaptive algorithm that asks $n^{O(t^5)}$ membership queries where the best lower bound for the number of membership queries is $n^t$ [Learning Threshold Functions with Small Weights Using Membership Queries. COLT 1999] In this paper we close this gap and give an adaptive proper learning algorithm with two rounds that asks $n^{O(t)}$ membership queries. We also give a non-adaptive proper learning algorithm that asks $n^{O(t^3)}$ membership queries. △ Less

Submitted 7 May, 2014; originally announced May 2014.

arXiv:cs/0604033 [pdf, ps, other]

Statistical Properties of Eigen-Modes and Instantaneous Mutual Information in MIMO Time-Varying Rayleigh Channels

Authors: Shuangquan Wang, Ali Abdi

Abstract: In this paper, we study two important metrics in multiple-input multiple-output (MIMO) time-varying Rayleigh flat fading channels. One is the eigen-mode, and the other is the instantaneous mutual information (IMI). Their second-order statistics, such as the correlation coefficient, level crossing rate (LCR), and average fade/outage duration, are investigated, assuming a general nonisotropic scat… ▽ More In this paper, we study two important metrics in multiple-input multiple-output (MIMO) time-varying Rayleigh flat fading channels. One is the eigen-mode, and the other is the instantaneous mutual information (IMI). Their second-order statistics, such as the correlation coefficient, level crossing rate (LCR), and average fade/outage duration, are investigated, assuming a general nonisotropic scattering environment. Exact closed-form expressions are derived and Monte Carlo simulations are provided to verify the accuracy of the analytical results. For the eigen-modes, we found they tend to be spatio-temporally uncorrelated in large MIMO systems. For the IMI, the results show that its correlation coefficient can be well approximated by the squared amplitude of the correlation coefficient of the channel, under certain conditions. Moreover, we also found the LCR of IMI is much more sensitive to the scattering environment than that of each eigen-mode. △ Less

Submitted 8 April, 2006; originally announced April 2006.

Comments: 25 pages, 7 figures, 1 table, submitted to IEEE Trans. Inform. Theory, Apr., 2006

arXiv:cs/0603027 [pdf, ps, other]

On the Second-Order Statistics of the Instantaneous Mutual Information in Rayleigh Fading Channels

Authors: Shuangquan Wang, Ali Abdi

Abstract: In this paper, the second-order statistics of the instantaneous mutual information are studied, in time-varying Rayleigh fading channels, assuming general non-isotropic scattering environments. Specifically, first the autocorrelation function, correlation coefficient, level crossing rate, and the average outage duration of the instantaneous mutual information are investigated in single-input sin… ▽ More In this paper, the second-order statistics of the instantaneous mutual information are studied, in time-varying Rayleigh fading channels, assuming general non-isotropic scattering environments. Specifically, first the autocorrelation function, correlation coefficient, level crossing rate, and the average outage duration of the instantaneous mutual information are investigated in single-input single-output (SISO) systems. Closed-form exact expressions are derived, as well as accurate approximations in low- and high-SNR regimes. Then, the results are extended to multiple-input single-output and single-input multiple-output systems, as well as multiple-input multiple-output systems with orthogonal space-time block code transmission. Monte Carlo simulations are provided to verify the accuracy of the analytical results. The results shed more light on the dynamic behavior of the instantaneous mutual information in mobile fading channels. △ Less

Submitted 7 March, 2006; originally announced March 2006.

Comments: 11 pages, 6 figures, submitted to IEEE Trans. Inform. Theory, Dec. 2005

Showing 1–29 of 29 results for author: Abdi, A