-
How Lexical is Bilingual Lexicon Induction?
Authors:
Harsh Kohli,
Helian Feng,
Nicholas Dronen,
Calvin McCarter,
Sina Moeini,
Ali Kebarighotbi
Abstract:
In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a map** between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical var…
▽ More
In contemporary machine learning approaches to bilingual lexicon induction (BLI), a model learns a map** between the embedding spaces of a language pair. Recently, retrieve-and-rank approach to BLI has achieved state of the art results on the task. However, the problem remains challenging in low-resource settings, due to the paucity of data. The task is complicated by factors such as lexical variation across languages. We argue that the incorporation of additional lexical information into the recent retrieve-and-rank approach should improve lexicon induction. We demonstrate the efficacy of our proposed approach on XLING, improving over the previous state of the art by an average of 2\% across all language pairs.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Look-ups are not (yet) all you need for deep learning inference
Authors:
Calvin McCarter,
Nicholas Dronen
Abstract:
Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by fitting a fast hash function from training data. In this work, we propose improvements to this previous work, targeted to the deep learning inference setting, wh…
▽ More
Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by fitting a fast hash function from training data. In this work, we propose improvements to this previous work, targeted to the deep learning inference setting, where one has access to both training data and fixed (already learned) model weight matrices. We further propose a fine-tuning procedure for accelerating entire neural networks while minimizing loss in accuracy. Finally, we analyze the proposed method on a simple image classification task. While we show improvements to prior work, overall classification accuracy remains substantially diminished compared to exact matrix multiplication. Our work, despite this negative result, points the way towards future efforts to accelerate inner products with fast nonlinear hashing methods.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
Adaptive Block Floating-Point for Analog Deep Learning Hardware
Authors:
Ayon Basumallik,
Darius Bunandar,
Nicholas Dronen,
Nicholas Harris,
Ludmila Levkova,
Calvin McCarter,
Lakshmi Nair,
David Walter,
David Widemann
Abstract:
Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We…
▽ More
Analog mixed-signal (AMS) devices promise faster, more energy-efficient deep neural network (DNN) inference than their digital counterparts. However, recent studies show that DNNs on AMS devices with fixed-point numbers can incur an accuracy penalty because of precision loss. To mitigate this penalty, we present a novel AMS-compatible adaptive block floating-point (ABFP) number representation. We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark -- realizing less than $1\%$ loss in accuracy compared to FLOAT32. We also propose a novel method of finetuning for AMS devices, Differential Noise Finetuning (DNF), which samples device noise to speed up finetuning compared to conventional Quantization-Aware Training.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Effective sampling for large-scale automated writing evaluation systems
Authors:
Nicholas Dronen,
Peter W. Foltz,
Kyle Habermehl
Abstract:
Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limi…
▽ More
Automated writing evaluation (AWE) has been shown to be an effective mechanism for quickly providing feedback to students. It has already seen wide adoption in enterprise-scale applications and is starting to be adopted in large-scale contexts. Training an AWE model has historically required a single batch of several hundred writing examples and human scores for each of them. This requirement limits large-scale adoption of AWE since human-scoring essays is costly. Here we evaluate algorithms for ensuring that AWE models are consistently trained using the most informative essays. Our results show how to minimize training set sizes while maximizing predictive performance, thereby reducing cost without unduly sacrificing accuracy. We conclude with a discussion of how to integrate this approach into large-scale AWE systems.
△ Less
Submitted 17 December, 2014;
originally announced December 2014.
-
Return probability and k-step measures
Authors:
Nicholas Dronen,
Qin Lv
Abstract:
The notion of return probability -- explored most famously by George Pólya on d-dimensional lattices -- has potential as a measure for the analysis of networks. We present an efficient method for finding return probability distributions for connected undirected graphs. We argue that return probability has the same discriminatory power as existing k-step measures -- in particular, beta centrality (…
▽ More
The notion of return probability -- explored most famously by George Pólya on d-dimensional lattices -- has potential as a measure for the analysis of networks. We present an efficient method for finding return probability distributions for connected undirected graphs. We argue that return probability has the same discriminatory power as existing k-step measures -- in particular, beta centrality (with negative beta), the graph-theoretical power index (GPI), and subgraph centrality. We compare the running time of our algorithm to beta centrality and subgraph centrality and find that it is significantly faster. When return probability is used to measure the same phenomena as beta centrality, it runs in linear time -- O(n+m), where n and m are the number of nodes and edges, respectively -- which takes much less time than either the matrix inversion or the sequence of matrix multiplications required for calculating the exact or approximate forms of beta centrality, respectively. We call this form of return probability the Pólya power index (PPI). Computing subgraph centrality requires an expensive eigendecomposition of the adjacency matrix; return probability runs in half the time of the eigendecomposition on a 2000-node network. These performance improvements are important because computationally efficient measures are necessary in order to analyze large networks.
△ Less
Submitted 23 May, 2011;
originally announced May 2011.