-
CMDS: Cross-layer Dataflow Optimization for DNN Accelerators Exploiting Multi-bank Memories
Authors:
Man Shi,
Steven Colleman,
Charlotte VanDeMieroop,
Antony Joseph,
Maurice Meijer,
Wim Dehaene,
Marian Verhelst
Abstract:
Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switc…
▽ More
Deep neural networks (DNN) use a wide range of network topologies to achieve high accuracy within diverse applications. This model diversity makes it impossible to identify a single "dataflow" (execution schedule) to perform optimally across all possible layers and network topologies. Several frameworks support the exploration of the best dataflow for a given DNN layer and hardware. However, switching the dataflow from one layer to the next layer within one DNN model can result in hardware inefficiencies stemming from memory data layout mismatch among the layers. Unfortunately, all existing frameworks treat each layer independently and typically model memories as black boxes (one large monolithic wide memory), which ignores the data layout and can not deal with the data layout dependencies of sequential layers. These frameworks are not capable of doing dataflow cross-layer optimization. This work, hence, aims at cross-layer dataflow optimization, taking the data dependency and data layout reshuffling overheads among layers into account. Additionally, we propose to exploit the multibank memories typically present in modern DNN accelerators towards efficiently reshuffling data to support more dataflow at low overhead. These innovations are supported through the Cross-layer Memory-aware Dataflow Scheduler (CMDS). CMDS can model DNN execution energy/latency while considering the different data layout requirements due to the varied optimal dataflow of layers. Compared with the state-of-the-art (SOTA), which performs layer-optimized memory-unaware scheduling, CMDS achieves up to 5.5X energy reduction and 1.35X latency reduction with negligible hardware cost.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Authors:
Sebastian Antony Joseph,
Lily Chen,
Jan Trienes,
Hannah Louisa Göke,
Monika Coers,
Wei Xu,
Byron C Wallace,
Junyi Jessy Li
Abstract:
Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly…
▽ More
Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly inform patient treatment. FactPICO consists of 345 plain language summaries of RCT abstracts generated from three LLMs (i.e., GPT-4, Llama-2, and Alpaca), with fine-grained evaluation and natural language rationales from experts. We assess the factuality of critical elements of RCTs in those summaries: Populations, Interventions, Comparators, Outcomes (PICO), as well as the reported findings concerning these. We also evaluate the correctness of the extra information (e.g., explanations) added by LLMs. Using FactPICO, we benchmark a range of existing factuality metrics, including the newly devised ones based on LLMs. We find that plain language summarization of medical evidence is still challenging, especially when balancing between simplicity and factuality, and that existing metrics correlate poorly with expert judgments on the instance level.
△ Less
Submitted 4 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
CHIC: Corporate Document for Visual question Answering
Authors:
Ibrahim Souleiman Mahamoud,
Mickael Coustaty,
Aurelie Joseph,
Vincent Poulain d Andecy,
Jean-Marc Ogier
Abstract:
The massive use of digital documents due to the substantial trend of paperless initiatives confronted some companies to find ways to process thousands of documents per day automatically. To achieve this, they use automatic information retrieval (IR) allowing them to extract useful information from large datasets quickly. In order to have effective IR methods, it is first necessary to have an adequ…
▽ More
The massive use of digital documents due to the substantial trend of paperless initiatives confronted some companies to find ways to process thousands of documents per day automatically. To achieve this, they use automatic information retrieval (IR) allowing them to extract useful information from large datasets quickly. In order to have effective IR methods, it is first necessary to have an adequate dataset. Although companies have enough data to take into account their needs, there is also a need for a public database to compare contributions between state-of-the-art methods. Public data on the document exists as DocVQA[2] and XFUND [10], but these do not fully satisfy the needs of companies. XFUND contains only form documents while the company uses several types of documents (i.e. structured documents like forms but also semi-structured as invoices, and unstructured as emails). Compared to XFUND, DocVQA has several types of documents but only 4.5% of them are corporate documents (i.e. invoice, purchase order, etc). All of this 4.5% of documents do not meet the diversity of documents required by the company. We propose CHIC a visual question-answering public dataset. This dataset contains different types of corporate documents and the information extracted from these documents meet the right expectations of companies.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
SCL: A Secure Concurrency Layer For Paranoid Stateful Lambdas
Authors:
Kaiyuan Chen,
Alexander Thomas,
Hanming Lu,
William Mullen,
Jeffery Ichnowski,
Rahul Arya,
Nivedha Krishnakumar,
Ryan Teoh,
Willis Wang,
Anthony Joseph,
John Kubiatowicz
Abstract:
We propose a federated Function-as-a-Service (FaaS) execution model that provides secure and stateful execution in both Cloud and Edge environments. The FaaS workers, called Paranoid Stateful Lambdas (PSLs), collaborate with one another to perform large parallel computations. We exploit cryptographically hardened and mobile bundles of data, called DataCapsules, to provide persistent state for our…
▽ More
We propose a federated Function-as-a-Service (FaaS) execution model that provides secure and stateful execution in both Cloud and Edge environments. The FaaS workers, called Paranoid Stateful Lambdas (PSLs), collaborate with one another to perform large parallel computations. We exploit cryptographically hardened and mobile bundles of data, called DataCapsules, to provide persistent state for our PSLs, whose execution is protected using hardware-secured TEEs. To make PSLs easy to program and performant, we build the familiar Key-Value Store interface on top of DataCapsules in a way that allows amortization of cryptographic operations. We demonstrate PSLs functioning in an edge environment running on a group of Intel NUCs with SGXv2.
As described, our Secure Concurrency Layer (SCL), provides eventually-consistent semantics over written values using untrusted and unordered multicast. All SCL communication is encrypted, unforgeable, and private. For durability, updates are recorded in replicated DataCapsules, which are append-only cryptographically-hardened blockchain with confidentiality, integrity, and provenance guarantees. Values for inactive keys are stored in a log-structured merge-tree (LSM) in the same DataCapsule. SCL features a variety of communication optimizations, such as an efficient message passing framework that reduces the latency up to 44x from the Intel SGX SDK, and an actor-based cryptographic processing architecture that batches cryptographic operations and increases throughput by 81x.
△ Less
Submitted 2 November, 2022; v1 submitted 20 October, 2022;
originally announced October 2022.
-
The Sky Above The Clouds
Authors:
Sarah Chasins,
Alvin Cheung,
Natacha Crooks,
Ali Ghodsi,
Ken Goldberg,
Joseph E. Gonzalez,
Joseph M. Hellerstein,
Michael I. Jordan,
Anthony D. Joseph,
Michael W. Mahoney,
Aditya Parameswaran,
David Patterson,
Raluca Ada Popa,
Koushik Sen,
Scott Shenker,
Dawn Song,
Ion Stoica
Abstract:
Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen ye…
▽ More
Technology ecosystems often undergo significant transformations as they mature. For example, telephony, the Internet, and PCs all started with a single provider, but in the United States each is now served by a competitive market that uses comprehensive and universal technology standards to provide compatibility. This white paper presents our view on how the cloud ecosystem, barely over fifteen years old, could evolve as it matures.
△ Less
Submitted 14 May, 2022;
originally announced May 2022.
-
Validation and Generalizability of Self-Supervised Image Reconstruction Methods for Undersampled MRI
Authors:
Thomas Yu,
Tom Hilbert,
Gian Franco Piredda,
Arun Joseph,
Gabriele Bonanno,
Salim Zenkhri,
Patrick Omoumi,
Meritxell Bach Cuadra,
Erick Jorge Canales-Rodríguez,
Tobias Kober,
Jean-Philippe Thiran
Abstract:
Deep learning methods have become the state of the art for undersampled MR reconstruction. Particularly for cases where it is infeasible or impossible for ground truth, fully sampled data to be acquired, self-supervised machine learning methods for reconstruction are becoming increasingly used. However potential issues in the validation of such methods, as well as their generalizability, remain un…
▽ More
Deep learning methods have become the state of the art for undersampled MR reconstruction. Particularly for cases where it is infeasible or impossible for ground truth, fully sampled data to be acquired, self-supervised machine learning methods for reconstruction are becoming increasingly used. However potential issues in the validation of such methods, as well as their generalizability, remain underexplored. In this paper, we investigate important aspects of the validation of self-supervised algorithms for reconstruction of undersampled MR images: quantitative evaluation of prospective reconstructions, potential differences between prospective and retrospective reconstructions, suitability of commonly used quantitative metrics, and generalizability. Two self-supervised algorithms based on self-supervised denoising and the deep image prior were investigated. These methods are compared to a least squares fitting and a compressed sensing reconstruction using in-vivo and phantom data. Their generalizability was tested with prospectively under-sampled data from experimental conditions different to the training. We show that prospective reconstructions can exhibit significant distortion relative to retrospective reconstructions/ground truth. Furthermore, pixel-wise quantitative metrics may not capture differences in perceptual quality accurately, in contrast to a perceptual metric. In addition, all methods showed potential for generalization; however, generalizability is more affected by changes in anatomy/contrast than other changes. We further showed that no-reference image metrics correspond well with human rating of image quality for studying generalizability. Finally, we showed that a well-tuned compressed sensing reconstruction and learned denoising perform similarly on all data.
△ Less
Submitted 12 September, 2022; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Enhancing the Interactivity of Dataframe Queries by Leveraging Think Time
Authors:
Doris Xin,
Devin Petersohn,
Dixin Tang,
Yifan Wu,
Joseph E. Gonzalez,
Joseph M. Hellerstein,
Anthony D. Joseph,
Aditya G. Parameswaran
Abstract:
We propose opportunistic evaluation, a framework for accelerating interactions with dataframes. Interactive latency is critical for iterative, human-in-the-loop dataframe workloads for supporting exploratory data analysis. Opportunistic evaluation significantly reduces interactive latency by 1) prioritizing computation directly relevant to the interactions and 2) leveraging think time for asynchro…
▽ More
We propose opportunistic evaluation, a framework for accelerating interactions with dataframes. Interactive latency is critical for iterative, human-in-the-loop dataframe workloads for supporting exploratory data analysis. Opportunistic evaluation significantly reduces interactive latency by 1) prioritizing computation directly relevant to the interactions and 2) leveraging think time for asynchronous background computation for non-critical operators that might be relevant to future interactions. We show, through empirical analysis, that current user behavior presents ample opportunities for optimization, and the solutions we propose effectively harness such opportunities.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Eye Tracking to Understand Impact of Aging on Mobile Phone Applications
Authors:
Antony William Joseph,
Jeevitha Shree DV,
Kamal Preet Singh Saluja,
Abhishek Mukhopadhyay,
Ramaswami Murugesh,
Pradipta Biswas
Abstract:
Usage of smartphones and tablets have been increasing rapidly with multi-touch interaction and powerful configurations. Performing tasks on mobile phones become more complex as people age, thereby increasing their cognitive workload. In this context, we conducted an eye tracking study with 50 participants between the age of 20 to 60 years and above, living in Bangalore, India. This paper focuses o…
▽ More
Usage of smartphones and tablets have been increasing rapidly with multi-touch interaction and powerful configurations. Performing tasks on mobile phones become more complex as people age, thereby increasing their cognitive workload. In this context, we conducted an eye tracking study with 50 participants between the age of 20 to 60 years and above, living in Bangalore, India. This paper focuses on visual nature of interaction with mobile user interfaces. The study aims to investigate how aging affects user experience on mobile phones while performing complex tasks, and estimate cognitive workload using eye tracking metrics. The study consisted of five tasks that were performed on an android mobile phone under naturalistic scenarios using eye tracking glasses. We recorded ocular parameters like fixation rate, saccadic rate, average fixation duration, maximum fixation duration and standard deviation of pupil dilation for left and right eyes respectively for each participant. Results from our study show that aging has a bigger effect on performance of using mobile phones irrespective of any complex task given to them. We noted that, participants aged between 50 to 60+ years had difficulties in completing tasks and showed increased cognitive workload. They took longer fixation duration to complete tasks which involved copy-paste operations. Further, we identifed design implications and provided design recommendations for designers and manufacturers.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Evaluation of Neural Network Classification Systems on Document Stream
Authors:
Joris Voerman,
Aurelie Joseph,
Mickael Coustaty,
Vincent Poulain d Andecy,
Jean-Marc Ogier
Abstract:
One major drawback of state of the art Neural Networks (NN)-based approaches for document classification purposes is the large number of training samples required to obtain an efficient classification. The minimum required number is around one thousand annotated documents for each class. In many cases it is very difficult, if not impossible, to gather this number of samples in real industrial proc…
▽ More
One major drawback of state of the art Neural Networks (NN)-based approaches for document classification purposes is the large number of training samples required to obtain an efficient classification. The minimum required number is around one thousand annotated documents for each class. In many cases it is very difficult, if not impossible, to gather this number of samples in real industrial processes. In this paper, we analyse the efficiency of NN-based document classification systems in a sub-optimal training case, based on the situation of a company document stream. We evaluated three different approaches, one based on image content and two on textual content. The evaluation was divided into four parts: a reference case, to assess the performance of the system in the lab; two cases that each simulate a specific difficulty linked to document stream processing; and a realistic case that combined all of these difficulties. The realistic case highlighted the fact that there is a significant drop in the efficiency of NN-Based document classification systems. Although they remain efficient for well represented classes (with an over-fitting of the system for those classes), it is impossible for them to handle appropriately less well represented classes. NN-Based document classification systems need to be adapted to resolve these two problems before they can be considered for use in a company document stream.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Understanding the Use of Crisis Informatics Technology among Older Adults
Authors:
Yixuan Zhang,
Nurul Suhaimi,
Rana Azghandi,
Mary Amulya Joseph,
Miso Kim,
Jacqueline Griffin,
Andrea G. Parker
Abstract:
Mass emergencies increasingly pose significant threats to human life, with a disproportionate burden being incurred by older adults. Research has explored how mobile technology can mitigate the effects of mass emergencies. However, less work has examined how mobile technologies support older adults during emergencies, considering their unique needs. To address this research gap, we interviewed 16…
▽ More
Mass emergencies increasingly pose significant threats to human life, with a disproportionate burden being incurred by older adults. Research has explored how mobile technology can mitigate the effects of mass emergencies. However, less work has examined how mobile technologies support older adults during emergencies, considering their unique needs. To address this research gap, we interviewed 16 older adults who had recent experience with an emergency evacuation to understand the perceived value of using mobile technology during emergencies. We found that there was a lack of awareness and engagement with existing crisis apps. Our findings characterize the ways in which our participants did and did not feel crisis informatics tools address human values, including basic needs and esteem needs. We contribute an understanding of how older adults used mobile technology during emergencies and their perspectives on how well such tools address human values.
△ Less
Submitted 21 January, 2020; v1 submitted 8 January, 2020;
originally announced January 2020.
-
Towards Scalable Dataframe Systems
Authors:
Devin Petersohn,
Stephen Macke,
Doris Xin,
William Ma,
Doris Lee,
Xiangxi Mo,
Joseph E. Gonzalez,
Joseph M. Hellerstein,
Anthony D. Joseph,
Aditya Parameswaran
Abstract:
Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in Rand Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in…
▽ More
Dataframes are a popular abstraction to represent, prepare, and analyze data. Despite the remarkable success of dataframe libraries in Rand Python, dataframes face performance issues even on moderately large datasets. Moreover, there is significant ambiguity regarding dataframe semantics. In this paper we lay out a vision and roadmap for scalable dataframe systems. To demonstrate the potential in this area, we report on our experience building MODIN, a scaled-up implementation of the most widely-used and complex dataframe API today, Python's pandas. With pandas as a reference, we propose a simple data model and algebra for dataframes to ground discussion in the field. Given this foundation, we lay out an agenda of open research opportunities where the distinct features of dataframes will require extending the state of the art in many dimensions of data management. We discuss the implications of signature data-frame features including flexible schemas, ordering, row/column equivalence, and data/metadata fluidity, as well as the piecemeal, trial-and-error-based approach to interacting with dataframes.
△ Less
Submitted 2 June, 2020; v1 submitted 3 January, 2020;
originally announced January 2020.
-
Parametric inference with universal function approximators
Authors:
Andreas Joseph
Abstract:
Universal function approximators, such as artificial neural networks, can learn a large variety of target functions arbitrarily well given sufficient training data. This flexibility comes at the cost of the ability to perform parametric inference. We address this gap by proposing a generic framework based on the Shapley-Taylor decomposition of a model. A surrogate parametric regression analysis is…
▽ More
Universal function approximators, such as artificial neural networks, can learn a large variety of target functions arbitrarily well given sufficient training data. This flexibility comes at the cost of the ability to perform parametric inference. We address this gap by proposing a generic framework based on the Shapley-Taylor decomposition of a model. A surrogate parametric regression analysis is performed in the space spanned by the Shapley value expansion of a model. This allows for the testing of standard hypotheses of interest. At the same time, the proposed approach provides novel insights into statistical learning processes themselves derived from the consistency and bias properties of the nonparametric estimators. We apply the framework to the estimation of heterogeneous treatment effects in simulated and real-world randomised experiments. We introduce an explicit treatment function based on higher-order Shapley-Taylor indices. This can be used to identify potentially complex treatment channels and help the generalisation of findings from experimental settings. More generally, the presented approach allows for a standardised use and communication of results from machine learning models.
△ Less
Submitted 4 October, 2020; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Using Multitask Learning to Improve 12-Lead Electrocardiogram Classification
Authors:
J. Weston Hughes,
Taylor Sittler,
Anthony D. Joseph,
Jeffrey E. Olgin,
Joseph E. Gonzalez,
Geoffrey H. Tison
Abstract:
We develop a multi-task convolutional neural network (CNN) to classify multiple diagnoses from 12-lead electrocardiograms (ECGs) using a dataset comprised of over 40,000 ECGs, with labels derived from cardiologist clinical interpretations. Since many clinically important classes can occur in low frequencies, approaches are needed to improve performance on rare classes. We compare the performance o…
▽ More
We develop a multi-task convolutional neural network (CNN) to classify multiple diagnoses from 12-lead electrocardiograms (ECGs) using a dataset comprised of over 40,000 ECGs, with labels derived from cardiologist clinical interpretations. Since many clinically important classes can occur in low frequencies, approaches are needed to improve performance on rare classes. We compare the performance of several single-class classifiers on rare classes to a multi-headed classifier across all available classes. We demonstrate that the addition of common classes can significantly improve CNN performance on rarer classes when compared to a model trained on the rarer class in isolation. Using this method, we develop a model with high performance as measured by F1 score on multiple clinically relevant classes compared against the gold-standard cardiologist interpretation.
△ Less
Submitted 4 December, 2018; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy Improvement
Authors:
Samuel Neumann,
Sungsu Lim,
A** Joseph,
Yangchen Pan,
Adam White,
Martha White
Abstract:
Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension o…
▽ More
Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension of the cross entropy method (CEM) to condition on inputs (states). The idea is to start with a broader policy and slowly concentrate around maximal actions, using a maximum likelihood update towards actions in the top percentile per state. The speed of this concentration is controlled by a proposal policy, that concentrates at a slower rate than the actor. We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values. We empirically show that our Greedy AC algorithm, that uses CCEM for the actor update, performs better than Soft Actor-Critic and is much less sensitive to entropy-regularization.
△ Less
Submitted 28 February, 2023; v1 submitted 22 October, 2018;
originally announced October 2018.
-
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
Authors:
A** George Joseph,
Shalabh Bhatnagar
Abstract:
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic appro…
▽ More
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.
△ Less
Submitted 15 June, 2018;
originally announced June 2018.
-
A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees
Authors:
A** George Joseph,
Shalabh Bhatnagar
Abstract:
The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is inefficient, both computationally and space wise. We provide a novel stochastic approximation version of the CE method, where the sample averaging is replaced wi…
▽ More
The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is inefficient, both computationally and space wise. We provide a novel stochastic approximation version of the CE method, where the sample averaging is replaced with incremental geometric averaging. This approach can save considerable computational and storage costs. Our algorithm is incremental in nature and possesses additional attractive features such as accuracy, stability, robustness and convergence to the global optimum for a particular class of objective functions. We evaluate the algorithm on a variety of global optimization benchmark problems and the results obtained corroborate our theoretical findings.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
An Incremental Off-policy Search in a Model-free Markov Decision Process Using a Single Sample Path
Authors:
A** George Joseph,
Shalabh Bhatnagar
Abstract:
In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which maximizes the value function, i.e., the long run discounted reward of the MDP. The current settings also assume access to a generat…
▽ More
In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to find an optimal policy which maximizes the value function, i.e., the long run discounted reward of the MDP. The current settings also assume access to a generative model of the MDP with the hidden premise that observations of the system behaviour in the form of sample trajectories can be obtained with ease from the model. In this paper, we consider a modified version, where the cost function is the expectation of a non-convex function of the value function without access to the generative model. Rather, we assume that a sample trajectory generated using a priori chosen behaviour policy is made available. In this restricted setting, we solve the modified control problem in its true sense, i.e., to find the best possible policy given this limited information. We propose a stochastic approximation algorithm based on the well-known cross entropy method which is data (sample trajectory) efficient, stable, robust as well as computationally and storage efficient. We provide a proof of convergence of our algorithm to a policy which is globally optimal relative to the behaviour policy. We also present experimental results to corroborate our claims and we demonstrate the superiority of the solution produced by our algorithm compared to the state-of-the-art algorithms under appropriately chosen behaviour policy.
△ Less
Submitted 30 January, 2018;
originally announced January 2018.
-
A Berkeley View of Systems Challenges for AI
Authors:
Ion Stoica,
Dawn Song,
Raluca Ada Popa,
David Patterson,
Michael W. Mahoney,
Randy Katz,
Anthony D. Joseph,
Michael Jordan,
Joseph M. Hellerstein,
Joseph E. Gonzalez,
Ken Goldberg,
Ali Ghodsi,
David Culler,
Pieter Abbeel
Abstract:
With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by…
▽ More
With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production. These changes have been made possible by unprecedented levels of data and computation, by methodological advances in machine learning, by innovations in systems software and architectures, and by the broad accessibility of these technologies.
The next generation of AI systems promises to accelerate these developments and increasingly impact our lives via frequent interactions and making (often mission-critical) decisions on our behalf, often in highly personalized contexts. Realizing this promise, however, raises daunting challenges. In particular, we need AI systems that make timely and safe decisions in unpredictable environments, that are robust against sophisticated adversaries, and that can process ever increasing amounts of data across organizations and individuals without compromising confidentiality. These challenges will be exacerbated by the end of the Moore's Law, which will constrain the amount of data these technologies can store and process. In this paper, we propose several open research directions in systems, architectures, and security that can address these challenges and help unlock AI's potential to improve lives and society.
△ Less
Submitted 15 December, 2017;
originally announced December 2017.
-
Reviewer Integration and Performance Measurement for Malware Detection
Authors:
Brad Miller,
Alex Kantchelian,
Michael Carl Tschantz,
Sadia Afroz,
Rekha Bachwani,
Riyaz Faizullabhoy,
Ling Huang,
Vaishaal Shankar,
Tony Wu,
George Yiu,
Anthony D. Joseph,
J. D. Tygar
Abstract:
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 mil…
▽ More
We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.
△ Less
Submitted 26 May, 2016; v1 submitted 25 October, 2015;
originally announced October 2015.
-
Evasion and Hardening of Tree Ensemble Classifiers
Authors:
Alex Kantchelian,
J. D. Tygar,
Anthony D. Joseph
Abstract:
Classifier evasion consists in finding for a given instance $x$ the nearest instance $x'$ such that the classifier predictions of $x$ and $x'$ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an…
▽ More
Classifier evasion consists in finding for a given instance $x$ the nearest instance $x'$ such that the classifier predictions of $x$ and $x'$ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.
△ Less
Submitted 26 May, 2016; v1 submitted 25 September, 2015;
originally announced September 2015.
-
I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis
Authors:
Brad Miller,
Ling Huang,
A. D. Joseph,
J. D. Tygar
Abstract:
Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the…
▽ More
Revelations of large scale electronic surveillance and data mining by governments and corporations have fueled increased adoption of HTTPS. We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation. We examine evaluation methodology and reveal accuracy variations as large as 18% caused by assumptions affecting caching and cookies. We present a novel defense reducing attack accuracy to 27% with a 9% traffic increase, and demonstrate significantly increased effectiveness of prior defenses in our evaluation context, inclusive of enabled caching, user-specific cookies and pages within the same website.
△ Less
Submitted 2 March, 2014;
originally announced March 2014.
-
Robust watermarking based on DWT SVD
Authors:
Anumol Joseph,
K. Anusudha
Abstract:
Digital information revolution has brought about many advantages and new issues. The protection of ownership and the prevention of unauthorized manipulation of digital audio, image, and video materials has become an important concern due to the ease of editing and perfect reproduction. Watermarking is identified as a major means to achieve copyright protection. It is a branch of information hiding…
▽ More
Digital information revolution has brought about many advantages and new issues. The protection of ownership and the prevention of unauthorized manipulation of digital audio, image, and video materials has become an important concern due to the ease of editing and perfect reproduction. Watermarking is identified as a major means to achieve copyright protection. It is a branch of information hiding which is used to hide proprietary information in digital media like photographs, digital music, digital video etc. In this paper, a new image watermarking algorithm that is robust against various attacks is presented. DWT (Discrete Wavelet Transform) and SVD (Singular Value Decomposition) have been used to embed two watermarks in the HL and LH bands of the host image. Simulation evaluation demonstrates that the proposed technique withstand various attacks.
△ Less
Submitted 26 September, 2013; v1 submitted 10 September, 2013;
originally announced September 2013.
-
Composite Centrality: A Natural Scale for Complex Evolving Networks
Authors:
Andreas Joseph,
Guanrong Chen
Abstract:
We derive a composite centrality measure for general weighted and directed complex networks, based on measure standardisation and invariant statistical inheritance schemes. Different schemes generate different intermediate abstract measures providing additional information, while the composite centrality measure tends to the standard normal distribution. This offers a unified scale to measure node…
▽ More
We derive a composite centrality measure for general weighted and directed complex networks, based on measure standardisation and invariant statistical inheritance schemes. Different schemes generate different intermediate abstract measures providing additional information, while the composite centrality measure tends to the standard normal distribution. This offers a unified scale to measure node and edge centralities for complex evolving networks under a uniform framework. Considering two real-world cases of the world trade web and the world migration web, both during a time span of 40 years, we propose a standard set-up to demonstrate its remarkable normative power and accuracy. We illustrate the applicability of the proposed framework for large and arbitrary complex systems, as well as its limitations, through extensive numerical simulations.
△ Less
Submitted 19 January, 2014; v1 submitted 16 November, 2012;
originally announced November 2012.
-
Fast Sparse Superposition Codes have Exponentially Small Error Probability for R < C
Authors:
Antony Joseph,
Andrew Barron
Abstract:
For the additive white Gaussian noise channel with average codeword power constraint, sparse superposition codes are developed. These codes are based on the statistical high-dimensional regression framework. The paper [IEEE Trans. Inform. Theory 55 (2012), 2541 - 2557] investigated decoding using the optimal maximum-likelihood decoding scheme. Here a fast decoding algorithm, called adaptive succes…
▽ More
For the additive white Gaussian noise channel with average codeword power constraint, sparse superposition codes are developed. These codes are based on the statistical high-dimensional regression framework. The paper [IEEE Trans. Inform. Theory 55 (2012), 2541 - 2557] investigated decoding using the optimal maximum-likelihood decoding scheme. Here a fast decoding algorithm, called adaptive successive decoder, is developed. For any rate R less than the capacity C communication is shown to be reliable with exponentially small error probability.
△ Less
Submitted 10 July, 2012;
originally announced July 2012.
-
Lossy Compression via Sparse Linear Regression: Performance under Minimum-distance Encoding
Authors:
Ramji Venkataramanan,
Antony Joseph,
Sekhar Tatikonda
Abstract:
We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression. Codewords are linear combinations of subsets of columns of a design matrix. Called a Sparse Superposition or Sparse Regression codebook, this structure is motivated by an analogous construction proposed recently by Barron a…
▽ More
We study a new class of codes for lossy compression with the squared-error distortion criterion, designed using the statistical framework of high-dimensional linear regression. Codewords are linear combinations of subsets of columns of a design matrix. Called a Sparse Superposition or Sparse Regression codebook, this structure is motivated by an analogous construction proposed recently by Barron and Joseph for communication over an AWGN channel. For i.i.d Gaussian sources and minimum-distance encoding, we show that such a code can attain the Shannon rate-distortion function with the optimal error exponent, for all distortions below a specified value. It is also shown that sparse regression codes are robust in the following sense: a codebook designed to compress an i.i.d Gaussian source of variance $σ^2$ with (squared-error) distortion $D$ can compress any ergodic source of variance less than $σ^2$ to within distortion $D$. Thus the sparse regression ensemble retains many of the good covering properties of the i.i.d random Gaussian ensemble, while having having a compact representation in terms of a matrix whose size is a low-order polynomial in the block-length.
△ Less
Submitted 18 December, 2015; v1 submitted 3 February, 2012;
originally announced February 2012.
-
Query Strategies for Evading Convex-Inducing Classifiers
Authors:
Blaine Nelson,
Benjamin I. P. Rubinstein,
Ling Huang,
Anthony D. Joseph,
Steven J. Lee,
Satish Rao,
J. D. Tygar
Abstract:
Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the adversary to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition input space into two set…
▽ More
Classifiers are often used to detect miscreant activities. We study how an adversary can systematically query a classifier to elicit information that allows the adversary to evade detection while incurring a near-minimal cost of modifying their intended malfeasance. We generalize the theory of Lowd and Meek (2005) to the family of convex-inducing classifiers that partition input space into two sets one of which is convex. We present query algorithms for this family that construct undetected instances of approximately minimal cost using only polynomially-many queries in the dimension of the space and in the level of approximation. Our results demonstrate that near-optimal evasion can be accomplished without reverse-engineering the classifier's decision boundary. We also consider general lp costs and show that near-optimal evasion on the family of convex-inducing classifiers is generally efficient for both positive and negative convexity for all levels of approximation if p=1.
△ Less
Submitted 3 July, 2010;
originally announced July 2010.
-
Toward Fast Reliable Communication at Rates Near Capacity with Gaussian Noise
Authors:
Andrew R Barron,
Antony Joseph
Abstract:
For the additive Gaussian noise channel with average codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Sh…
▽ More
For the additive Gaussian noise channel with average codeword power constraint, sparse superposition codes and adaptive successive decoding is developed. Codewords are linear combinations of subsets of vectors, with the message indexed by the choice of subset. A feasible decoding algorithm is presented. Communication is reliable with error probability exponentially small for all rates below the Shannon capacity.
△ Less
Submitted 19 June, 2010;
originally announced June 2010.
-
Least Squares Superposition Codes of Moderate Dictionary Size, Reliable at Rates up to Capacity
Authors:
Andrew R. Barron,
Antony Joseph
Abstract:
For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is sh…
▽ More
For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity.
△ Less
Submitted 18 June, 2010;
originally announced June 2010.
-
Near-Optimal Evasion of Convex-Inducing Classifiers
Authors:
Blaine Nelson,
Benjamin I. P. Rubinstein,
Ling Huang,
Anthony D. Joseph,
Shing-hon Lau,
Steven J. Lee,
Satish Rao,
Anthony Tran,
J. D. Tygar
Abstract:
Classifiers are often used to detect miscreant activities. We study how an adversary can efficiently query a classifier to elicit information that allows the adversary to evade detection at near-minimal cost. We generalize results of Lowd and Meek (2005) to convex-inducing classifiers. We present algorithms that construct undetected instances of near-minimal cost using only polynomially many queri…
▽ More
Classifiers are often used to detect miscreant activities. We study how an adversary can efficiently query a classifier to elicit information that allows the adversary to evade detection at near-minimal cost. We generalize results of Lowd and Meek (2005) to convex-inducing classifiers. We present algorithms that construct undetected instances of near-minimal cost using only polynomially many queries in the dimension of the space and without reverse engineering the decision boundary.
△ Less
Submitted 13 March, 2010;
originally announced March 2010.