-
$\nabla$SD: Differentiable Programming for Sparse Tensors
Authors:
Amir Shaikhha,
Mathieu Huot,
Shideh Hashemian
Abstract:
Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through sparse tensor operations, as their irregular sparsity patterns can result in substantial memory and computational overheads. In this work, we introduce a novel frame…
▽ More
Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through sparse tensor operations, as their irregular sparsity patterns can result in substantial memory and computational overheads. In this work, we introduce a novel framework that enables the efficient and automatic differentiation of sparse tensors, addressing this fundamental issue. Our experiments demonstrate the effectiveness of the proposed framework in terms of performance and scalability, outperforming state-of-the-art frameworks across a range of synthetic and real-world datasets. Our approach offers a promising direction for enabling efficient and scalable differentiable programming with sparse tensors, which has significant implications for numerous applications in machine learning, natural language processing, and scientific computing.
△ Less
Submitted 13 March, 2023;
originally announced March 2023.
-
Efficient and Sound Differentiable Programming in a Functional Array-Processing Language
Authors:
Amir Shaikhha,
Mathieu Huot,
Shabnam Ghasemirad,
Andrew Fitzgibbon,
Simon Peyton Jones,
Dimitrios Vytiniotis
Abstract:
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the…
▽ More
Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the presence of vectors, is suboptimal. We present an AD system for a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source forward-mode AD and global optimisations such as loop transformations. In combination, gradient computation with forward-mode AD can be as efficient as reverse mode, and the Jacobian matrices required for numerical algorithms such as Gauss-Newton and Levenberg-Marquardt can be efficiently computed.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Denotationally Correct, Purely Functional, Efficient Reverse-mode Automatic Differentiation
Authors:
Mathieu Huot,
Amir Shaikhha
Abstract:
Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely functional language with array operations. It is the first one to deliver a provably efficient, purely functional, and denotationally correct reverse-mode differentia…
▽ More
Reverse-mode differentiation is used for optimization, but it introduces references, which break the purity of the underlying programs, making them notoriously harder to optimize. We present a reverse-mode differentiation on a purely functional language with array operations. It is the first one to deliver a provably efficient, purely functional, and denotationally correct reverse-mode differentiation. We show that our transformation is semantically correct and verifies the cheap gradient principle. Inspired by PROPs and compilation to categories, we introduce a novel intermediate representation that we call 'unary form'. Our reverse-mode transformation is factored as a compilation scheme through this intermediate representation. We obtain provably efficient gradients by performing general partial evaluation optimizations after our reverse-mode transformation, as opposed to manually derived ones. For simple first-order programs, the obtained output programs resemble static-single-assignment (SSA) code. We emphasize the modularity of our approach and show how our language can easily be enriched with more optimized primitives, as required for some speed-ups in practice.
△ Less
Submitted 26 April, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Compiling Structured Tensor Algebra
Authors:
Mahdi Ghorbani,
Mathieu Huot,
Shideh Hashemian,
Amir Shaikhha
Abstract:
Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the structure provided by sparse tensors. This paper presents StructTensor, a framework that symbolically computes structure at compilation time. This is…
▽ More
Tensor algebra is essential for data-intensive workloads in various computational domains. Computational scientists face a trade-off between the specialization degree provided by dense tensor algebra and the algorithmic efficiency that leverages the structure provided by sparse tensors. This paper presents StructTensor, a framework that symbolically computes structure at compilation time. This is enabled by Structured Tensor Unified Representation (STUR), an intermediate language that can capture tensor computations as well as their sparsity and redundancy structures. Through a mathematical view of lossless tensor computations, we show that our symbolic structure computation and the related optimizations are sound. Finally, for different tensor computation workloads and structures, we experimentally show how capturing the symbolic structure can result in outperforming state-of-the-art frameworks for both dense and sparse tensor algebra.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Optimizing Tensor Programs on Flexible Storage
Authors:
Maximilian Schleich,
Amir Shaikhha,
Dan Suciu
Abstract:
Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the Coordinate Format, the Compressed Sparse Row format, and many others, that were especially designed to optimally store tensors with specific sparsity properties. How…
▽ More
Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the Coordinate Format, the Compressed Sparse Row format, and many others, that were especially designed to optimally store tensors with specific sparsity properties. However, existing tensor processing systems require specialized extensions in order to take advantage of every new storage format. In this paper we describe a system that allows users to define flexible storage formats in a declarative tensor query language, similar to the language used by the tensor program. The programmer only needs to write storage map**s, which describe, in a declarative way, how the tensors are laid out in main memory. Then, we describe a cost-based optimizer that optimizes the tensor program for the specific memory layout. We demonstrate empirically significant performance improvements compared to state-of-the-art tensor processing systems.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Hinted Dictionaries: Efficient Functional Ordered Sets and Maps
Authors:
Amir Shaikhha,
Mahdi Ghorbani,
Hesam Shahrokhi
Abstract:
This article introduces hinted dictionaries for expressing efficient ordered sets and maps functionally. As opposed to the traditional ordered dictionaries with logarithmic operations, hinted dictionaries can achieve better performance by using cursor-like objects referred to as hints. Hinted dictionaries unify the interfaces of imperative ordered dictionaries (e.g., C++ maps) and functional ones…
▽ More
This article introduces hinted dictionaries for expressing efficient ordered sets and maps functionally. As opposed to the traditional ordered dictionaries with logarithmic operations, hinted dictionaries can achieve better performance by using cursor-like objects referred to as hints. Hinted dictionaries unify the interfaces of imperative ordered dictionaries (e.g., C++ maps) and functional ones (e.g., Adams' sets). We show that such dictionaries can use sorted arrays, unbalanced trees, and balanced trees as their underlying representations. Throughout the article, we use Scala to present the different components of hinted dictionaries. We also provide a C++ implementation to evaluate the effectiveness of hinted dictionaries. Hinted dictionaries provide superior performance for set-set operations in comparison with the standard library of C++. Also, they show a competitive performance in comparison with the SciPy library for sparse vector operations.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Fine-Tuning Data Structures for Analytical Query Processing
Authors:
Amir Shaikhha,
Marios Kelepeshis,
Mahdi Ghorbani
Abstract:
We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed aroun…
▽ More
We introduce a framework for automatically choosing data structures to support efficient computation of analytical workloads. Our contributions are twofold. First, we introduce a novel low-level intermediate language that can express the algorithms behind various query processing paradigms such as classical joins, groupjoin, and in-database machine learning engines. This language is designed around the notion of dictionaries, and allows for a more fine-grained choice of its low-level implementation. Second, the cost model for alternative implementations is automatically inferred by combining machine learning and program reasoning. The dictionary cost model is learned using a regression model trained over the profiling dataset of dictionary operations on a given hardware architecture. The program cost model is inferred using static program analysis.
Our experimental results show the effectiveness of the trained cost model on micro benchmarks. Furthermore, we show that the performance of the code generated by our framework either outperforms or is on par with the state-of-the-art analytical query engines and a recent in-database machine learning framework.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Functional Collection Programming with Semi-Ring Dictionaries
Authors:
Amir Shaikhha,
Mathieu Huot,
Jaclyn Smith,
Dan Olteanu
Abstract:
This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using se…
▽ More
This paper introduces semi-ring dictionaries, a powerful class of compositional and purely functional collections that subsume other collection types such as sets, multisets, arrays, vectors, and matrices. We developed SDQL, a statically typed language that can express relational algebra with aggregations, linear algebra, and functional collections over data such as relations and matrices using semi-ring dictionaries. Furthermore, thanks to the algebraic structure behind these dictionaries, SDQL unifies a wide range of optimizations commonly used in databases (DB) and linear algebra (LA). As a result, SDQL enables efficient processing of hybrid DB and LA workloads, by putting together optimizations that are otherwise confined to either DB systems or LA frameworks. We show experimentally that a handful of DB and LA workloads can take advantage of the SDQL language and optimizations. SDQL can be competitive with or outperforms a host of systems that are state of the art in their own domain: in-memory DB systems Typer and Tectorwise for (flat, not nested) relational data; SciPy for LA workloads; sparse tensor compiler taco; the Trance nested relational engine; and the in-database machine learning engines LMFAO and Morpheus for hybrid DB/LA workloads over relational data.
△ Less
Submitted 22 March, 2022; v1 submitted 10 March, 2021;
originally announced March 2021.
-
BayesCard: Revitilizing Bayesian Frameworks for Cardinality Estimation
Authors:
Ziniu Wu,
Amir Shaikhha,
Rong Zhu,
Kai Zeng,
Yuxing Han,
**gren Zhou
Abstract:
Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawback…
▽ More
Cardinality estimation (CardEst) is an essential component in query optimizers and a fundamental problem in DBMS. A desired CardEst method should attain good algorithm performance, be stable to varied data settings, and be friendly to system deployment. However, no existing CardEst method can fulfill the three criteria at the same time. Traditional methods often have significant algorithm drawbacks such as large estimation errors. Recently proposed deep learning based methods largely improve the estimation accuracy but their performance can be greatly affected by data and often difficult for system deployment.
In this paper, we revitalize the Bayesian networks (BN) for CardEst by incorporating the techniques of probabilistic programming languages. We present BayesCard, the first framework that inherits the advantages of BNs, i.e., high estimation accuracy and interpretability, while overcomes their drawbacks, i.e. low structure learning and inference efficiency. This makes BayesCard a perfect candidate for commercial DBMS deployment. Our experimental results on several single-table and multi-table benchmarks indicate BayesCard's superiority over existing state-of-the-art CardEst methods: BayesCard achieves comparable or better accuracy, 1-2 orders of magnitude faster inference time, 1-3 orders faster training time, 1-3 orders smaller model size, and 1-2 orders faster updates. Meanwhile, BayesCard keeps stable performance when varying data with different settings. We also deploy BayesCard into PostgreSQL. On the IMDB benchmark workload, it improves the end-to-end query time by 13.3%, which is very close to the optimal result of 14.2% using an oracle of true cardinality.
△ Less
Submitted 2 February, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
Scalable Querying of Nested Data
Authors:
Jaclyn Smith,
Michael Benedikt,
Milos Nikolic,
Amir Shaikhha
Abstract:
While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen…
▽ More
While large-scale distributed data processing platforms have become an attractive target for query processing, these systems are problematic for applications that deal with nested collections. Programmers are forced either to perform non-trivial translations of collection programs or to employ automated flattening procedures, both of which lead to performance problems. These challenges only worsen for nested collections with skewed cardinalities, where both handcrafted rewriting and automated flattening are unable to enforce load balancing across partitions.
In this work, we propose a framework that translates a program manipulating nested collections into a set of semantically equivalent shredded queries that can be efficiently evaluated. The framework employs a combination of query compilation techniques, an efficient data representation for nested collections, and automated skew-handling. We provide an extensive experimental evaluation, demonstrating significant improvements provided by the framework in diverse scenarios for nested collection programs.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Multi-layer Optimizations for End-to-End Data Analytics
Authors:
Amir Shaikhha,
Maximilian Schleich,
Alexandru Ghita,
Dan Olteanu
Abstract:
We consider the problem of training machine learning models over multi-relational data. The mainstream approach is to first construct the training dataset using a feature extraction query over input database and then use a statistical software package of choice to train the model. In this paper we introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative app…
▽ More
We consider the problem of training machine learning models over multi-relational data. The mainstream approach is to first construct the training dataset using a feature extraction query over input database and then use a statistical software package of choice to train the model. In this paper we introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language, which captures a subset of Python commonly used in Jupyter notebooks for rapid prototy** of machine learning applications. The program is subject to several layers of IFAQ optimizations, such as algebraic transformations, loop transformations, schema specialization, data layout optimizations, and finally compilation into efficient low-level C++ code specialized for the given workload and data.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and TensorFlow by several orders of magnitude for linear regression and regression tree models over several relational datasets.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
A Compiler-Compiler for DSL Embedding
Authors:
Amir Shaikhha,
Vo** Jovanovic,
Christoph Koch
Abstract:
In this paper, we present a framework to generate compilers for embedded domain-specific languages (EDSLs). This framework provides facilities to automatically generate the boilerplate code required for building DSL compilers on top of extensible optimizing compilers. We evaluate the practicality of our framework by demonstrating several use-cases successfully built with it.
In this paper, we present a framework to generate compilers for embedded domain-specific languages (EDSLs). This framework provides facilities to automatically generate the boilerplate code required for building DSL compilers on top of extensible optimizing compilers. We evaluate the practicality of our framework by demonstrating several use-cases successfully built with it.
△ Less
Submitted 3 August, 2018;
originally announced August 2018.
-
Compiling Database Application Programs
Authors:
Mohammad Dashti,
Sachin Basil John,
Thierry Coppey,
Amir Shaikhha,
Vo** Jovanovic,
Christoph Koch
Abstract:
There is a trend towards increased specialization of data management software for performance reasons. In this paper, we study the automatic specialization and optimization of database application programs -- sequences of queries and updates, augmented with control flow constructs as they appear in database scripts, UDFs, transactional workloads and triggers in languages such as PL/SQL. We show ho…
▽ More
There is a trend towards increased specialization of data management software for performance reasons. In this paper, we study the automatic specialization and optimization of database application programs -- sequences of queries and updates, augmented with control flow constructs as they appear in database scripts, UDFs, transactional workloads and triggers in languages such as PL/SQL. We show how to build an optimizing compiler for database application programs using generative programming and state-of-the-art compiler technology.
We evaluate a hand-optimized low-level implementation of TPC-C, and identify the key optimization techniques that account for its good performance. Our compiler fully automates these optimizations and, applied to this benchmark, outperforms the manually optimized baseline by a factor of two. By selectively disabling some of the optimizations in the compiler, we derive a clinical and precise way of obtaining insight into their individual performance contributions.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
Efficient Differentiable Programming in a Functional Array-Processing Language
Authors:
Amir Shaikhha,
Andrew Fitzgibbon,
Dimitrios Vytiniotis,
Simon Peyton Jones,
Christoph Koch
Abstract:
We present a system for the automatic differentiation of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source automatic differentiation and global optimizations such as loop transformations. Thanks to this feature, we demonstrate how for some real-world machine learning and computer vision benchmarks,…
▽ More
We present a system for the automatic differentiation of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source automatic differentiation and global optimizations such as loop transformations. Thanks to this feature, we demonstrate how for some real-world machine learning and computer vision benchmarks, the system outperforms the state-of-the-art automatic differentiation tools.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Building Efficient Query Engines in a High-Level Language
Authors:
Amir Shaikhha,
Yannis Klonatos,
Christoph Koch
Abstract:
Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to…
▽ More
Abstraction without regret refers to the vision of using high-level programming languages for systems development without experiencing a negative impact on performance. A database system designed according to this vision offers both increased productivity and high performance, instead of sacrificing the former for the latter as is the case with existing, monolithic implementations that are hard to maintain and extend. In this article, we realize this vision in the domain of analytical query processing. We present LegoBase, a query engine written in the high-level language Scala. The key technique to regain efficiency is to apply generative programming: LegoBase performs source-to-source compilation and optimizes the entire query engine by converting the high-level Scala code to specialized, low-level C code. We show how generative programming allows to easily implement a wide spectrum of optimizations, such as introducing data partitioning or switching from a row to a column data layout, which are difficult to achieve with existing low-level query compilers that handle only queries. We demonstrate that sufficiently powerful abstractions are essential for dealing with the complexity of the optimization effort, shielding developers from compiler internals and decoupling individual optimizations from each other. We evaluate our approach with the TPC-H benchmark and show that: (a) With all optimizations enabled, LegoBase significantly outperforms a commercial database and an existing query compiler. (b) Programmers need to provide just a few hundred lines of high-level code for implementing the optimizations, instead of complicated low-level code that is required by existing query compilation approaches. (c) The compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for compiling query engines.
△ Less
Submitted 16 December, 2016;
originally announced December 2016.
-
Push vs. Pull-Based Loop Fusion in Query Engines
Authors:
Amir Shaikhha,
Mohammad Dashti,
Christoph Koch
Abstract:
Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each engine. Similarly, the programming languages community has developed loop fusion techniques to remove intermediate collections in the context of collection pro…
▽ More
Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each engine. Similarly, the programming languages community has developed loop fusion techniques to remove intermediate collections in the context of collection programming. We draw parallels between the DB and PL communities by demonstrating the connection between pipelined query engines and loop fusion techniques. Based on this connection, we propose a new type of pull-based engine, inspired by a loop fusion technique, which combines the benefits of both approaches. Then we experimentally evaluate the various engines, in the context of query compilation, for the first time in a fair environment, eliminating the biasing impact of ancillary optimizations that have traditionally only been used with one of the approaches. We show that for realistic analytical workloads, there is no considerable advantage for either form of pipelined query engine, as opposed to what recent research suggests. Also, by using microbenchmarks we show that our proposed engine dominates the existing engines by combining the benefits of both.
△ Less
Submitted 28 October, 2016;
originally announced October 2016.
-
Repairing Conflicts among MVCC Transactions
Authors:
Mohammad Dashti,
Sachin Basil John,
Amir Shaikhha,
Christoph Koch
Abstract:
The optimistic variants of MVCC (Multi-Version Concurrency Control) avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for the use cases with high contention objects or long running transactions. In addi…
▽ More
The optimistic variants of MVCC (Multi-Version Concurrency Control) avoid blocking concurrent transactions at the cost of having a validation phase. Upon failure in the validation phase, the transaction is usually aborted and restarted from scratch. The "abort and restart" approach becomes a performance bottleneck for the use cases with high contention objects or long running transactions. In addition, restarting from scratch creates a negative feedback loop in the system, because the system incurs additional overhead that may create even further conflicts.
In this paper, we propose a novel approach for conflict resolution in MVCC for in-memory databases. This low overhead approach summarizes the transaction programs in the form of a dependency graph. The dependency graph also contains the constructs used in the validation phase of the MVCC algorithm. Then, in the case of encountering conflicts among transactions, the conflict locations in the program are quickly detected, and the conflicting transactions are partially re-executed. This approach maximizes the reuse of the computations done in the initial execution round, and increases the transaction processing throughput.
△ Less
Submitted 1 March, 2016;
originally announced March 2016.