Search | arXiv e-print repository

Automatic Recognition of Food Ingestion Environment from the AIM-2 Wearable Sensor

Authors: Yuning Huang, Mohamed Abul Hassan, Jiangpeng He, Janine Higgins, Megan McCrory, Heather Eicher-Miller, Graham Thomas, Edward O Sazonov, Fengqing Maggie Zhu

Abstract: Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage… ▽ More Detecting an ingestion environment is an important aspect of monitoring dietary intake. It provides insightful information for dietary assessment. However, it is a challenging problem where human-based reviewing can be tedious, and algorithm-based review suffers from data imbalance and perceptual aliasing problems. To address these issues, we propose a neural network-based method with a two-stage training framework that tactfully combines fine-tuning and transfer learning techniques. Our method is evaluated on a newly collected dataset called ``UA Free Living Study", which uses an egocentric wearable camera, AIM-2 sensor, to simulate food consumption in free-living conditions. The proposed training framework is applied to common neural network backbones, combined with approaches in the general imbalanced classification field. Experimental results on the collected dataset show that our proposed method for automatic ingestion environment recognition successfully addresses the challenging data imbalance problem in the dataset and achieves a promising overall classification accuracy of 96.63%. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted at CVPRw 2024

arXiv:2403.19546 [pdf, other]

doi 10.1145/3650203.3663326

Croissant: A Metadata Format for ML-Ready Datasets

Authors: Mubashara Akhtar, Omar Benjelloun, Costanza Conforti, Pieter Gijsbers, Joan Giner-Miguelez, Nitisha Jain, Michael Kuchnik, Quentin Lhoest, Pierre Marcenac, Manil Maskey, Peter Mattson, Luis Oala, Pierre Ruyssen, Rajat Shinde, Elena Simperl, Goeffry Thomas, Slava Tykhonov, Joaquin Vanschoren, Jos van der Velde, Steffen Vogler, Carole-Jean Wu

Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is… ▽ More Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks. △ Less

Submitted 30 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Published in Proceedings of ACM SIGMOD/PODS'24 Data Management for End-to-End Machine Learning (DEEM) Workshop https://dl.acm.org/doi/10.1145/3650203.3663326

arXiv:2311.16284 [pdf]

Simultaneous Energy Harvesting and Hand Gesture Recognition in Large Area Monolithic Dye-Sensitized Solar Cells

Authors: Gethin Thomas, Adam Pockett, Kris Seunarine, Matt Carnie

Abstract: Internet of Things (IoT) devices have become prevalent, embedding intelligence into our environment. It is projected that over 75 billion IoT devices will be connected by 2025 worldwide, with the majority being operated indoors. Dye-sensitized solar cells (DSSC) have recently been optimized for ambient light, having the capabilities of providing sufficient energy for self-powered IoT devices. Inte… ▽ More Internet of Things (IoT) devices have become prevalent, embedding intelligence into our environment. It is projected that over 75 billion IoT devices will be connected by 2025 worldwide, with the majority being operated indoors. Dye-sensitized solar cells (DSSC) have recently been optimized for ambient light, having the capabilities of providing sufficient energy for self-powered IoT devices. Interaction with digital technologies, termed Human Computer Interaction (HCI), is often achieved via physical mechanisms (e.g. remote controls, cell phones) which can hinder the natural interface between users and IoT devices, a key consideration for HCI. What if the solar cell that is powering the IoT device can also recognize hand gestures which would allow the user to naturally interact with the system? Previous attempts to achieve this have necessarily employed an array of solar cell/photodiodes to detect directionality. In this work, we demonstrate that by monitoring the photocurrent output of an asymmetrically patterned monolithic (i.e., single cell) DSSC, and using machine learning, we can recognize simple hand gestures, achieving an accuracy prediction of 97.71%. This work shows that, DSSCs are the perfect choice for self-powered interactive technologies, both in terms of powering IoT devices in ambient light conditions and having aesthetic qualities that are prioritized by users. As well as powering interactive technologies, they can also provide a means of interactive control. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Main body: 10 pages, 6 figures, 3 tables. Document includes supplementary info: 30 pages, 47 supplementary figures

arXiv:2310.16754 [pdf, other]

CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often n… ▽ More In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often not balanced within a context; this results in poor performance. In this paper, we propose a novel end-to-end Contextual Multi-modal Alignment (CAD) network that addresses the challenges in AVQA methods by i) introducing a parameter-free stochastic Contextual block that ensures robust audio and visual alignment on the Spatial level; ii) proposing a pre-training technique for dynamic audio and visual alignment on Temporal level in a self-supervised setting, and iii) introducing a cross-attention mechanism to balance audio and visual information on Semantic level. The proposed novel CAD network improves the overall performance over the state-of-the-art methods on average by 9.4% on the MUSIC-AVQA dataset. We also demonstrate that our proposed contributions to AVQA can be added to the existing methods to improve their performance without additional complexity requirements. △ Less

Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2310.15582 [pdf, other]

doi 10.1145/3590140.3629116

SecV: Secure Code Partitioning via Multi-Language Secure Values

Authors: Peterson Yuhala, Pascal Felber, Hugo Guiroux, Jean-Pierre Lozi, Alain Tchana, Valerio Schiavoni, Gaël Thomas

Abstract: Trusted execution environments like Intel SGX provide \emph{enclaves}, which offer strong security guarantees for applications. Running entire applications inside enclaves is possible, but this approach leads to a large trusted computing base (TCB). As such, various tools have been developed to partition programs written in languages such as C or Java into \emph{trusted} and \emph{untrusted} parts… ▽ More Trusted execution environments like Intel SGX provide \emph{enclaves}, which offer strong security guarantees for applications. Running entire applications inside enclaves is possible, but this approach leads to a large trusted computing base (TCB). As such, various tools have been developed to partition programs written in languages such as C or Java into \emph{trusted} and \emph{untrusted} parts, which are run in and out of enclaves respectively. However, those tools depend on language-specific taint-analysis and partitioning techniques. They cannot be reused for other languages and there is thus a need for tools that transcend this language barrier. We address this challenge by proposing a multi-language technique to specify sensitive code or data, as well as a multi-language tool to analyse and partition the resulting programs for trusted execution environments like Intel SGX. We leverage GraalVM's Truffle framework, which provides a language-agnostic abstract syntax tree (AST) representation for programs, to provide special AST nodes called \emph{secure nodes} that encapsulate sensitive program information. Secure nodes can easily be embedded into the ASTs of a wide range of languages via Truffle's \emph{polyglot API}. Our technique includes a multi-language dynamic taint tracking tool to analyse and partition applications based on our generic secure nodes. Our extensive evaluation with micro- and macro-benchmarks shows that we can use our technique for two languages (Javascript and \python), and that partitioned programs can obtain up to $14.5\%$ performance improvement as compared to unpartitioned versions. △ Less

Submitted 20 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 12 pages

arXiv:2308.03901 [pdf, other]

FLIPS: Federated Learning using Intelligent Participant Selection

Authors: Rahul Atul Bhope, K. R. Jayaram, Nalini Venkatasubramanian, Ashish Verma, Gegi Thomas

Abstract: This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori,… ▽ More This paper presents the design and implementation of FLIPS, a middleware system to manage data and participant heterogeneity in federated learning (FL) training workloads. In particular, we examine the benefits of label distribution clustering on participant selection in federated learning. FLIPS clusters parties involved in an FL training job based on the label distribution of their data apriori, and during FL training, ensures that each cluster is equitably represented in the participants selected. FLIPS can support the most common FL algorithms, including FedAvg, FedProx, FedDyn, FedOpt and FedYogi. To manage platform heterogeneity and dynamic resource availability, FLIPS incorporates a straggler management mechanism to handle changing capacities in distributed, smart community applications. Privacy of label distributions, clustering and participant selection is ensured through a trusted execution environment (TEE). Our comprehensive empirical evaluation compares FLIPS with random participant selection, as well as three other "smart" selection mechanisms - Oort, TiFL and gradient clustering using two real-world datasets, two benchmark datasets, two different non-IID distributions and three common FL algorithms (FedYogi, FedProx and FedAvg). We demonstrate that FLIPS significantly improves convergence, achieving higher accuracy by 17 - 20 % with 20 - 60 % lower communication costs, and these benefits endure in the presence of straggler participants. △ Less

Submitted 30 September, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2306.10709 [pdf, other]

Machine learning of hidden variables in multiscale fluid simulation

Authors: Archis S. Joglekar, Alexander G. R. Thomas

Abstract: Solving fluid dynamics equations often requires the use of closure relations that account for missing microphysics. For example, when solving equations related to fluid dynamics for systems with a large Reynolds number, sub-grid effects become important and a turbulence closure is required, and in systems with a large Knudsen number, kinetic effects become important and a kinetic closure is requir… ▽ More Solving fluid dynamics equations often requires the use of closure relations that account for missing microphysics. For example, when solving equations related to fluid dynamics for systems with a large Reynolds number, sub-grid effects become important and a turbulence closure is required, and in systems with a large Knudsen number, kinetic effects become important and a kinetic closure is required. By adding an equation governing the growth and transport of the quantity requiring the closure relation, it becomes possible to capture microphysics through the introduction of ``hidden variables'' that are non-local in space and time. The behavior of the ``hidden variables'' in response to the fluid conditions can be learned from a higher fidelity or ab-initio model that contains all the microphysics. In our study, a partial differential equation simulator that is end-to-end differentiable is used to train judiciously placed neural networks against ground-truth simulations. We show that this method enables an Euler equation based approach to reproduce non-linear, large Knudsen number plasma physics that can otherwise only be modeled using Boltzmann-like equation simulators such as Vlasov or Particle-In-Cell modeling. △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2305.02244 [pdf, other]

NVMM cache design: Logging vs. Paging

Authors: Rémi Dulong, Quentin Acher, Baptiste Lepers, Valerio Schiavoni, Pascal Felber, Gaël Thomas

Abstract: Modern NVMM is closing the gap between DRAM and persistent storage, both in terms of performance and features. Having both byte addressability and persistence on the same device gives NVMM an unprecedented set of features, leading to the following question: How should we design an NVMM-based caching system to fully exploit its potential? We build two caching mechanisms, NVPages and NVLog, based on… ▽ More Modern NVMM is closing the gap between DRAM and persistent storage, both in terms of performance and features. Having both byte addressability and persistence on the same device gives NVMM an unprecedented set of features, leading to the following question: How should we design an NVMM-based caching system to fully exploit its potential? We build two caching mechanisms, NVPages and NVLog, based on two radically different design approaches. NVPages stores memory pages in NVMM, similar to the Linux page cache (LPC). NVLog uses NVMM to store a log of pending write operations to be submitted to the LPC, while it ensures reads with a small DRAM cache. Our study shows and quantifies advantages and flaws for both designs. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 3 pages, 4 figures, presented for NVMW'23: 14th Annual Non-Volatile Memories Workshop

MSC Class: 68M15

arXiv:2305.00766 [pdf, other]

doi 10.1145/3464298.3493406

Montsalvat: Intel SGX Shielding for GraalVM Native Images

Authors: Peterson Yuhala, Jämes Ménétrey, Pascal Felber, Valerio Schiavoni, Alain Tchana, Gaël Thomas, Hugo Guiroux, Jean-Pierre Lozi

Abstract: The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software… ▽ More The popularity of the Java programming language has led to its wide adoption in cloud computing infrastructures. However, Java applications running in untrusted clouds are vulnerable to various forms of privileged attacks. The emergence of trusted execution environments (TEEs) such as Intel SGX mitigates this problem. TEEs protect code and data in secure enclaves inaccessible to untrusted software, including the kernel and hypervisors. To efficiently use TEEs, developers must manually partition their applications into trusted and untrusted parts, in order to reduce the size of the trusted computing base (TCB) and minimise the risks of security vulnerabilities. However, partitioning applications poses two important challenges: (i) ensuring efficient object communication between the partitioned components, and (ii) ensuring the consistency of garbage collection between the parts, especially with memory-managed languages such as Java. We present Montsalvat, a tool which provides a practical and intuitive annotation-based partitioning approach for Java applications destined for secure enclaves. Montsalvat provides an RMI-like mechanism to ensure inter-object communication, as well as consistent garbage collection across the partitioned components. We implement Montsalvat with GraalVM native-image, a tool for compiling Java applications ahead-of-time into standalone native executables that do not require a JVM at runtime. Our extensive evaluation with micro- and macro-benchmarks shows our partitioning approach to boost performance in real-world applications △ Less

Submitted 20 December, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 13 pages, Proceedings of the 22nd International Middleware Conference

arXiv:2303.14829 [pdf, other]

SEM-POS: Grammatically and Semantically Correct Video Captioning

Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

Abstract: Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and f… ▽ More Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and fuses features from different parts of speech (POS) components with visual-spatial features. We use novel combinations of different POS components - 'determinant + subject', 'auxiliary verb', 'verb', and 'determinant + object' for supervision of the POS blocks - Det + Subject, Aux Verb, Verb, and Det + Object respectively. The novel global-local fusion network together with POS blocks helps align the visual features with language description to generate grammatically and semantically correct captions. Extensive qualitative and quantitative experiments on benchmark MSVD and MSRVTT datasets demonstrate that the proposed approach generates more grammatically and semantically correct captions compared to the existing methods, achieving the new state-of-the-art. Ablations on the POS blocks and the GLFB demonstrate the impact of the contributions on the proposed method. △ Less

Submitted 4 April, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

arXiv:2303.08789 [pdf, other]

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

Authors: Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

Abstract: A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos -- a type of data available… ▽ More A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations. In this work we propose PLEX, a transformer-based architecture that learns from a small amount of task-agnostic visuomotor trajectories and a much larger amount of task-conditioned object manipulation videos -- a type of data available in quantity. PLEX uses visuomotor trajectories to induce a latent feature space and to learn task-agnostic manipulation routines, while diverse video-only demonstrations teach PLEX how to plan in the induced latent feature space for a wide variety of tasks. Experiments showcase PLEX's generalization on Meta-World and SOTA performance in challenging Robosuite environments. In particular, using relative positional encoding in PLEX's transformers greatly helps in low-data regimes of learning from human-collected demonstrations. The paper's accompanying code and data are available at https://microsoft.github.io/PLEX. △ Less

Submitted 8 November, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.00823 [pdf, other]

Automated control and optimisation of laser driven ion acceleration

Authors: B. Loughran, M. J. V. Streeter, H. Ahmed, S. Astbury, M. Balcazar, M. Borghesi, N. Bourgeois, C. B. Curry, S. J. D. Dann, S. DiIorio, N. P. Dover, T. Dzelzanis, O. C. Ettlinger, M. Gauthier, L. Giuffrida, G. D. Glenn, S. H. Glenzer, J. S. Green, R. J. Gray, G. S. Hicks, C. Hyland, V. Istokskaia, M. King, D. Margarone, O. McCusker , et al. (10 additional authors not shown)

Abstract: The interaction of relativistically intense lasers with opaque targets represents a highly non-linear, multi-dimensional parameter space. This limits the utility of sequential 1D scanning of experimental parameters for the optimisation of secondary radiation, although to-date this has been the accepted methodology due to low data acquisition rates. High repetition-rate (HRR) lasers augmented by ma… ▽ More The interaction of relativistically intense lasers with opaque targets represents a highly non-linear, multi-dimensional parameter space. This limits the utility of sequential 1D scanning of experimental parameters for the optimisation of secondary radiation, although to-date this has been the accepted methodology due to low data acquisition rates. High repetition-rate (HRR) lasers augmented by machine learning present a valuable opportunity for efficient source optimisation. Here, an automated, HRR-compatible system produced high fidelity parameter scans, revealing the influence of laser intensity on target pre-heating and proton generation. A closed-loop Bayesian optimisation of maximum proton energy, through control of the laser wavefront and target position, produced proton beams with equivalent maximum energy to manually-optimized laser pulses but using only 60% of the laser energy. This demonstration of automated optimisation of laser-driven proton beams is a crucial step towards deeper physical insight and the construction of future radiation sources. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: 11 pages

arXiv:2208.09740 [pdf, other]

Just-in-Time Aggregation for Federated Learning

Authors: K. R. Jayaram, Ashish Verma, Gegi Thomas, Vinod Muthusamy

Abstract: The increasing number and scale of federated learning (FL) jobs necessitates resource efficient scheduling and management of aggregation to make the economics of cloud-hosted aggregation work. Existing FL research has focused on the design of FL algorithms and optimization, and less on the efficacy of aggregation. Existing FL platforms often employ aggregators that actively wait for model updates.… ▽ More The increasing number and scale of federated learning (FL) jobs necessitates resource efficient scheduling and management of aggregation to make the economics of cloud-hosted aggregation work. Existing FL research has focused on the design of FL algorithms and optimization, and less on the efficacy of aggregation. Existing FL platforms often employ aggregators that actively wait for model updates. This wastes computational resources on the cloud, especially in large scale FL settings where parties are intermittently available for training. In this paper, we propose a new FL aggregation paradigm -- "just-in-time" (JIT) aggregation that leverages unique properties of FL jobs, especially the periodicity of model updates, to defer aggregation as much as possible and free compute resources for other FL jobs or other datacenter workloads. We describe a novel way to prioritize FL jobs for aggregation, and demonstrate using multiple datasets, models and FL aggregation algorithms that our techniques can reduce resource usage by 60+\% when compared to eager aggregation used in existing FL platforms. We also demonstrate that using JIT aggregation has negligible overhead and impact on the latency of the FL job. △ Less

Submitted 20 August, 2022; originally announced August 2022.

Comments: 10 pages. Extended version of the paper accepted to MASCOTS 2022. arXiv admin note: text overlap with arXiv:2203.12163

arXiv:2206.01637 [pdf, other]

doi 10.1017/S0022377822000939

Unsupervised Discovery of Inertial-Fusion Plasma Physics using Differentiable Kinetic Simulations and a Maximum Entropy Loss Function

Authors: Archis S. Joglekar, Alexander G. R. Thomas

Abstract: Plasma supports collective modes and particle-wave interactions that leads to complex behavior in inertial fusion energy applications. While plasma can sometimes be modeled as a charged fluid, a kinetic description is useful towards the study of nonlinear effects in the higher dimensional momentum-position phase-space that describes the full complexity of plasma dynamics. We create a differentiabl… ▽ More Plasma supports collective modes and particle-wave interactions that leads to complex behavior in inertial fusion energy applications. While plasma can sometimes be modeled as a charged fluid, a kinetic description is useful towards the study of nonlinear effects in the higher dimensional momentum-position phase-space that describes the full complexity of plasma dynamics. We create a differentiable solver for the plasma kinetics 3D partial-differential-equation and introduce a domain-specific objective function. Using this framework, we perform gradient-based optimization of neural networks that provide forcing function parameters to the differentiable solver given a set of initial conditions. We apply this to an inertial-fusion relevant configuration and find that the optimization process exploits a novel physical effect that has previously remained undiscovered. △ Less

Submitted 27 July, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: 2nd AI4Science Workshop at the 39th International Conference on Machine Learning (ICML), 2022

arXiv:2205.00155 [pdf, ps, other]

doi 10.48550/arXiv.2205.00155

Real-Time Gait Phase and Task Estimation for Controlling a Powered Ankle Exoskeleton on Extremely Uneven Terrain

Authors: Roberto Leo Medrano, Gray Cortright Thomas, Connor G. Keais, Elliott J. Rouse, Robert D. Gregg

Abstract: Positive biomechanical outcomes have been reported with lower-limb exoskeletons in laboratory settings, but these devices have difficulty delivering appropriate assistance in synchrony with human gait as the task or rate of phase progression change in real-world environments. This paper presents a controller for an ankle exoskeleton that uses a data-driven kinematic model to continuously estimate… ▽ More Positive biomechanical outcomes have been reported with lower-limb exoskeletons in laboratory settings, but these devices have difficulty delivering appropriate assistance in synchrony with human gait as the task or rate of phase progression change in real-world environments. This paper presents a controller for an ankle exoskeleton that uses a data-driven kinematic model to continuously estimate the phase, phase rate, stride length, and ground incline states during locomotion, which enables the real-time adaptation of torque assistance to match human torques observed in a multi-activity database of 10 able-bodied subjects. We demonstrate in live experiments with a new cohort of 10 able-bodied participants that the controller yields phase estimates comparable to the state of the art, while also estimating task variables with similar accuracy to recent machine learning approaches. The implemented controller successfully adapts its assistance in response to changing phase and task variables, both during controlled treadmill trials (N=10, phase RMSE: 4.8 +- 2.4\%) and a real-world stress test with extremely uneven terrain (N=1, phase RMSE: 4.8 +- 2.7\%). △ Less

Submitted 6 October, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

arXiv:2203.12163 [pdf, other]

Adaptive Aggregation For Federated Learning

Authors: K. R. Jayaram, Vinod Muthusamy, Gegi Thomas, Ashish Verma, Mark Purcell

Abstract: Advances in federated learning (FL) algorithms,along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based o… ▽ More Advances in federated learning (FL) algorithms,along with technologies like differential privacy and homomorphic encryption, have led to FL being increasingly adopted and used in many application domains. This increasing adoption has led to rapid growth in the number, size (number of participants/parties) and diversity (intermittent vs. active parties) of FL jobs. Many existing FL systems, based on centralized (often single) model aggregators are unable to scale to handle large FL jobs and adapt to parties' behavior. In this paper, we present a new scalable and adaptive architecture for FL aggregation. First, we demonstrate how traditional tree overlay based aggregation techniques (from P2P, publish-subscribe and stream processing research) can help FL aggregation scale, but are ineffective from a resource utilization and cost standpoint. Next, we present the design and implementation of AdaFed, which uses serverless/cloud functions to adaptively scale aggregation in a resource efficient and fault tolerant manner. We describe how AdaFed enables FL aggregation to be dynamically deployed only when necessary, elastically scaled to handle participant joins/leaves and is fault tolerant with minimal effort required on the (aggregation) programmer side. We also demonstrate that our prototype based on Ray scales to thousands of participants, and is able to achieve a >90% reduction in resource requirements and cost, with minimal impact on aggregation latency. △ Less

Submitted 6 November, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

ACM Class: C.2.4; C.4

arXiv:2202.12246 [pdf, other]

Neural reality of argument structure constructions

Authors: Bai Li, Zining Zhu, Guillaume Thomas, Frank Rudzicz, Yang Xu

Abstract: In lexicalist linguistic theories, argument structure is assumed to be predictable from the meaning of verbs. As a result, the verb is the primary determinant of the meaning of a clause. In contrast, construction grammarians propose that argument structure is encoded in constructions (or form-meaning pairs) that are distinct from verbs. Decades of psycholinguistic research have produced substantia… ▽ More In lexicalist linguistic theories, argument structure is assumed to be predictable from the meaning of verbs. As a result, the verb is the primary determinant of the meaning of a clause. In contrast, construction grammarians propose that argument structure is encoded in constructions (or form-meaning pairs) that are distinct from verbs. Decades of psycholinguistic research have produced substantial empirical evidence in favor of the construction view. Here we adapt several psycholinguistic studies to probe for the existence of argument structure constructions (ASCs) in Transformer-based language models (LMs). First, using a sentence sorting experiment, we find that sentences sharing the same construction are closer in embedding space than sentences sharing the same verb. Furthermore, LMs increasingly prefer grou** by construction with more input data, mirroring the behaviour of non-native language learners. Second, in a "Jabberwocky" priming-based experiment, we find that LMs associate ASCs with meaning, even in semantically nonsensical sentences. Our work offers the first evidence for ASCs in LMs and highlights the potential to devise novel probing methods grounded in psycholinguistic research. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: ACL 2022 (Long Paper)

arXiv:2202.07789 [pdf, other]

Safe Reinforcement Learning by Imagining the Near Future

Authors: Garrett Thomas, Yu** Luo, Tengyu Ma

Abstract: Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences. In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe… ▽ More Safe reinforcement learning is a promising path toward applying reinforcement learning algorithms to real-world problems, where suboptimal behaviors may lead to actual negative consequences. In this work, we focus on the setting where unsafe states can be avoided by planning ahead a short time into the future. In this setting, a model-based agent with a sufficiently accurate model can avoid unsafe states. We devise a model-based algorithm that heavily penalizes unsafe trajectories, and derive guarantees that our algorithm can avoid unsafe states under certain assumptions. Experiments demonstrate that our algorithm can achieve competitive rewards with fewer safety violations in several continuous control tasks. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: Accepted at NeurIPS 2021

arXiv:2110.01562 [pdf, other]

Enhancing Voluntary Motion with Modular, Backdrivable, Powered Hip and Knee Orthoses

Authors: Christopher Nesler, Gray Thomas, Nikhil Divekar, Elliott J. Rouse, Robert D. Gregg

Abstract: Mobility disabilities are prominent in society with wide-ranging detriments to affected individuals. Addressing the specific deficits of individuals within this heterogeneous population requires modular, partial-assist, lower-limb exoskeletons. This paper introduces the Modular Backdrivable Lower-limb Unloading Exoskeleton (M-BLUE), which implements high torque, low mechanical impedance actuators… ▽ More Mobility disabilities are prominent in society with wide-ranging detriments to affected individuals. Addressing the specific deficits of individuals within this heterogeneous population requires modular, partial-assist, lower-limb exoskeletons. This paper introduces the Modular Backdrivable Lower-limb Unloading Exoskeleton (M-BLUE), which implements high torque, low mechanical impedance actuators on commercial orthoses with sheet metal modifications to produce a variety of hip- and/or knee-assisting configurations. Benchtop system identification verifies the desirable backdrive properties of the actuator, and allows for torque prediction within 0.4 Nm. An able-bodied human subject experiment demonstrates that three unilateral configurations of M-BLUE (hip only, knee only, and hip-knee) with a simple gravity compensation controller can reduce muscle EMG readings in a lifting and lowering task relative to the bare condition. Reductions in mean muscular effort and peak muscle activation were seen across the primary squat musculature (excluding biceps femoris), demonstrating the potential to reduce fatigue leading to poor lifting posture. These promising results motivate applications of M-BLUE to additional subject populations such as hip/knee osteoarthritis and geriatric frailty, and the expansion of M-BLUE to bilateral and ankle configurations. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Comments: 8 pages, 7 figures

arXiv:2109.02145 [pdf, other]

Temporal Shift Reinforcement Learning

Authors: Deepak George Thomas, Tichakorn Wongpiromsarn, Ali Jannesari

Abstract: The function approximators employed by traditional image-based Deep Reinforcement Learning (DRL) algorithms usually lack a temporal learning component and instead focus on learning the spatial component. We propose a technique, Temporal Shift Reinforcement Learning (TSRL), wherein both temporal, as well as spatial components are jointly learned. Moreover, TSRL does not require additional parameter… ▽ More The function approximators employed by traditional image-based Deep Reinforcement Learning (DRL) algorithms usually lack a temporal learning component and instead focus on learning the spatial component. We propose a technique, Temporal Shift Reinforcement Learning (TSRL), wherein both temporal, as well as spatial components are jointly learned. Moreover, TSRL does not require additional parameters to perform temporal learning. We show that TSRL outperforms the commonly used frame stacking heuristic on both of the Atari environments we test on while beating the SOTA for one of them. This investigation has implications in the robotics as well as sequential decision-making domains. △ Less

Submitted 26 October, 2021; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2105.10397 [pdf, other]

doi 10.1109/DSN48987.2021.00033

NVCache: A Plug-and-Play NVMM-based I/O Booster for Legacy Systems

Authors: Rémi Dulong, Rafael Pires, Andreia Correia, Valerio Schiavoni, Pedro Ramalhete, Pascal Felber, Gaël Thomas

Abstract: This paper introduces NVCache, an approach that uses a non-volatile main memory (NVMM) as a write cache to improve the write performance of legacy applications. We compare NVCache against file systems tailored for NVMM (Ext4-DAX and NOVA) and with I/O-heavy applications (SQLite, RocksDB). Our evaluation shows that NVCache reaches the performance level of the existing state-of-the-art systems for N… ▽ More This paper introduces NVCache, an approach that uses a non-volatile main memory (NVMM) as a write cache to improve the write performance of legacy applications. We compare NVCache against file systems tailored for NVMM (Ext4-DAX and NOVA) and with I/O-heavy applications (SQLite, RocksDB). Our evaluation shows that NVCache reaches the performance level of the existing state-of-the-art systems for NVMM, but without their limitations: NVCache does not limit the size of the stored data to the size of the NVMM, and works transparently with unmodified legacy applications, providing additional persistence guarantees even when their source code is not available. △ Less

Submitted 3 September, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: 13 pages, 7 figures, to be published in the 51th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 21)

MSC Class: 68M20 ACM Class: D.4.2; D.4.3; D.4.8

arXiv:2105.07452 [pdf, other]

How is BERT surprised? Layerwise detection of linguistic anomalies

Authors: Bai Li, Zining Zhu, Guillaume Thomas, Yang Xu, Frank Rudzicz

Abstract: Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower… ▽ More Transformer language models have shown remarkable ability in detecting when a word is anomalous in context, but likelihood scores offer no information about the cause of the anomaly. In this work, we use Gaussian models for density estimation at intermediate layers of three language models (BERT, RoBERTa, and XLNet), and evaluate our method on BLiMP, a grammaticality judgement benchmark. In lower layers, surprisal is highly correlated to low token frequency, but this correlation diminishes in upper layers. Next, we gather datasets of morphosyntactic, semantic, and commonsense anomalies from psycholinguistic studies; we find that the best performing model RoBERTa exhibits surprisal in earlier layers when the anomaly is morphosyntactic than when it is semantic, while commonsense anomalies do not exhibit surprisal at any intermediate layer. These results suggest that language models employ separate mechanisms to detect different types of linguistic anomalies. △ Less

Submitted 16 May, 2021; originally announced May 2021.

Comments: ACL 2021 (Long Paper)

arXiv:2104.04060 [pdf, other]

Network in Disaggregated Datacenters

Authors: Brice Ekane, Yohan Pipereau, Boris Teabe, Alain Tchana, Gael Thomas, Noel de palma, Daniel Hagimont

Abstract: Nowadays, datacenters lean on a computer-centric approach based on monolithic servers which include all necessary hardware resources (mainly CPU, RAM, network and disks) to run applications. Such an architecture comes with two main limitations: (1) difficulty to achieve full resource utilization and (2) coarse granularity for hardware maintenance. Recently, many works investigated a resource-centr… ▽ More Nowadays, datacenters lean on a computer-centric approach based on monolithic servers which include all necessary hardware resources (mainly CPU, RAM, network and disks) to run applications. Such an architecture comes with two main limitations: (1) difficulty to achieve full resource utilization and (2) coarse granularity for hardware maintenance. Recently, many works investigated a resource-centric approach called disaggregated architecture where the datacenter is composed of self-content resource boards interconnected using fast interconnection technologies, each resource board including instances of one resource type. The resource-centric architecture allows each resource to be managed (maintenance, allocation) independently. LegoOS is the first work which studied the implications of disaggregation on the operating system, proposing to disaggregate the operating system itself. They demonstrated the suitability of this approach, considering mainly CPU and RAM resources. However, they didn't study the implication of disaggregation on network resources. We reproduced a LegoOS infrastructure and extended it to support disaggregated networking. We show that networking can be disaggregated following the same principles, and that classical networking optimizations such as DMA, DDIO or loopback can be reproduced in such an environment. Our evaluations show the viability of the approach and the potential of future disaggregated infrastructures. △ Less

Submitted 15 March, 2021; originally announced April 2021.

Comments: 10 pages, 8 figures

arXiv:2012.00740 [pdf, other]

MYSTIKO : : Cloud-Mediated, Private, Federated Gradient Descent

Authors: K. R. Jayaram, Archit Verma, Ashish Verma, Gegi Thomas, Colin Sutcher-Shepard

Abstract: Federated learning enables multiple, distributed participants (potentially on different clouds) to collaborate and train machine/deep learning models by sharing parameters/gradients. However, sharing gradients, instead of centralizing data, may not be as private as one would expect. Reverse engineering attacks on plaintext gradients have been demonstrated to be practically feasible. Existing solut… ▽ More Federated learning enables multiple, distributed participants (potentially on different clouds) to collaborate and train machine/deep learning models by sharing parameters/gradients. However, sharing gradients, instead of centralizing data, may not be as private as one would expect. Reverse engineering attacks on plaintext gradients have been demonstrated to be practically feasible. Existing solutions for differentially private federated learning, while promising, lead to less accurate models and require nontrivial hyperparameter tuning. In this paper, we examine the use of additive homomorphic encryption (specifically the Paillier cipher) to design secure federated gradient descent techniques that (i) do not require addition of statistical noise or hyperparameter tuning, (ii) does not alter the final accuracy or utility of the final model, (iii) ensure that the plaintext model parameters/gradients of a participant are never revealed to any other participant or third party coordinator involved in the federated learning job, (iv) minimize the trust placed in any third party coordinator and (v) are efficient, with minimal overhead, and cost effective. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: IEEE CLOUD 2020

arXiv:2009.12446 [pdf, other]

doi 10.1109/TNSRE.2020.3027501

A Complex Stiffness Human Impedance Model with Customizable Exoskeleton Control

Authors: Binghan He, Huang Huang, Gray C. Thomas, Luis Sentis

Abstract: The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency… ▽ More The natural impedance, or dynamic relationship between force and motion, of a human operator can determine the stability of exoskeletons that use interaction-torque feedback to amplify human strength. While human impedance is typically modelled as a linear system, our experiments on a single-joint exoskeleton testbed involving 10 human subjects show evidence of nonlinear behavior: a low-frequency asymptotic phase for the dynamic stiffness of the human that is different than the expected zero, and an unexpectedly consistent dam** ratio as the stiffness and inertia vary. To explain these observations, this paper considers a new frequency-domain model of the human joint dynamics featuring complex value stiffness comprising a real stiffness term and a hysteretic dam** term. Using a statistical F-test we show that the hysteretic dam** term is not only significant but is even more significant than the linear dam** term. Further analysis reveals a linear trend linking hysteretic dam** and the real part of the stiffness, which allows us to simplify the complex stiffness model down to a 1-parameter system. Then, we introduce and demonstrate a customizable fractional-order controller that exploits this hysteretic dam** behavior to improve strength amplification bandwidth while maintaining stability, and explore a tuning approach which ensures that this stability property is robust to muscle co-contraction for each individual. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: 10 pages, 7 figures, 4 tables. arXiv admin note: text overlap with arXiv:1903.00704

arXiv:2009.09241 [pdf, other]

Word class flexibility: A deep contextualized approach

Authors: Bai Li, Guillaume Thomas, Yang Xu, Frank Rudzicz

Abstract: Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexib… ▽ More Word class flexibility refers to the phenomenon whereby a single word form is used across different grammatical categories. Extensive work in linguistic typology has sought to characterize word class flexibility across languages, but quantifying this phenomenon accurately and at scale has been fraught with difficulties. We propose a principled methodology to explore regularity in word class flexibility. Our method builds on recent work in contextualized word embeddings to quantify semantic shift between word classes (e.g., noun-to-verb, verb-to-noun), and we apply this method to 37 languages. We find that contextualized embeddings not only capture human judgment of class variation within words in English, but also uncover shared tendencies in class flexibility across languages. Specifically, we find greater semantic variation when flexible lemmas are used in their dominant word class, supporting the view that word class flexibility is a directional process. Our work highlights the utility of deep contextualized models in linguistic typology. △ Less

Submitted 19 September, 2020; originally announced September 2020.

Comments: To appear in EMNLP 2020 (Long Paper)

arXiv:2007.10987 [pdf, other]

IBM Federated Learning: an Enterprise Framework White Paper V0.1

Authors: Heiko Ludwig, Nathalie Baracaldo, Gegi Thomas, Yi Zhou, Ali Anwar, Shashank Rajamoni, Yuya Ong, Jayaram Radhakrishnan, Ashish Verma, Mathieu Sinn, Mark Purcell, Ambrish Rawat, Tran Minh, Naoise Holohan, Supriyo Chakraborty, Shalisha Whitherspoon, Dean Steuer, Laura Wynter, Hifaz Hassan, Sean Laguna, Mikhail Yurochkin, Mayank Agarwal, Ebube Chuba, Annie Abay

Abstract: Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learn… ▽ More Federated Learning (FL) is an approach to conduct machine learning without centralizing training data in a single place, for reasons of privacy, confidentiality or data volume. However, solving federated machine learning problems raises issues above and beyond those of centralized machine learning. These issues include setting up communication infrastructure between parties, coordinating the learning process, integrating party results, understanding the characteristics of the training data sets of different participating parties, handling data heterogeneity, and operating with the absence of a verification data set. IBM Federated Learning provides infrastructure and coordination for federated learning. Data scientists can design and run federated learning jobs based on existing, centralized machine learning models and can provide high-level instructions on how to run the federation. The framework applies to both Deep Neural Networks as well as ``traditional'' approaches for the most common machine learning libraries. {\proj} enables data scientists to expand their scope from centralized to federated machine learning, minimizing the learning curve at the outset while also providing the flexibility to deploy to different compute environments and design custom fusion algorithms. △ Less

Submitted 22 July, 2020; originally announced July 2020.

Comments: 17 pages

ACM Class: I.2.6; I.2.11

arXiv:2006.08875 [pdf, other]

Model-based Adversarial Meta-Reinforcement Learning

Authors: Zichuan Lin, Garrett Thomas, Guangwen Yang, Tengyu Ma

Abstract: Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this pape… ▽ More Meta-reinforcement learning (meta-RL) aims to learn from multiple training tasks the ability to adapt efficiently to unseen test tasks. Despite the success, existing meta-RL algorithms are known to be sensitive to the task distribution shift. When the test task distribution is different from the training task distribution, the performance may degrade significantly. To address this issue, this paper proposes Model-based Adversarial Meta-Reinforcement Learning (AdMRL), where we aim to minimize the worst-case sub-optimality gap -- the difference between the optimal return and the return that the algorithm achieves after adaptation -- across all tasks in a family of tasks, with a model-based approach. We propose a minimax objective and optimize it by alternating between learning the dynamics model on a fixed task and finding the adversarial task for the current model -- the task for which the policy induced by the model is maximally suboptimal. Assuming the family of tasks is parameterized, we derive a formula for the gradient of the suboptimality with respect to the task parameters via the implicit function theorem, and show how the gradient estimator can be efficiently implemented by the conjugate gradient method and a novel use of the REINFORCE estimator. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy in the worst-case performance over all tasks, the generalization power to out-of-distribution tasks, and in training and test time sample efficiency, over existing state-of-the-art meta-RL algorithms. △ Less

Submitted 27 February, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Accepted by NeurIPS 2020. Code at https://github.com/LinZichuan/AdMRL

arXiv:2005.13239 [pdf, other]

MOPO: Model-based Offline Policy Optimization

Authors: Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma

Abstract: Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dangerous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited… ▽ More Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any costly or dangerous active exploration. However, it is also challenging, due to the distributional shift between the offline training data and those states visited by the learned policy. Despite significant recent progress, the most successful prior methods are model-free and constrain the policy to the support of data, precluding generalization to unseen states. In this paper, we first observe that an existing model-based RL algorithm already produces significant gains in the offline setting compared to model-free approaches. However, standard model-based RL methods, designed for the online setting, do not provide an explicit mechanism to avoid the offline setting's distributional shift issue. Instead, we propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics. We theoretically show that the algorithm maximizes a lower bound of the policy's return under the true MDP. We also characterize the trade-off between the gain and risk of leaving the support of the batch data. Our algorithm, Model-based Offline Policy Optimization (MOPO), outperforms standard model-based RL algorithms and prior state-of-the-art model-free offline RL algorithms on existing offline RL benchmarks and two challenging continuous control tasks that require generalizing from data collected for a different task. The code is available at https://github.com/tianheyu927/mopo. △ Less

Submitted 22 November, 2020; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: NeurIPS 2020. First two authors contributed equally. Last two authors advised equally

arXiv:1907.04964 [pdf, other]

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning

Authors: Nicholas C. Landolfi, Garrett Thomas, Tengyu Ma

Abstract: The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform… ▽ More The aim of multi-task reinforcement learning is two-fold: (1) efficiently learn by training against multiple tasks and (2) quickly adapt, using limited samples, to a variety of new tasks. In this work, the tasks correspond to reward functions for environments with the same (or similar) dynamical models. We propose to learn a dynamical model during the training process and use this model to perform sample-efficient adaptation to new tasks at test time. We use significantly fewer samples by performing policy optimization only in a "virtual" environment whose transitions are given by our learned dynamical model. Our algorithm sequentially trains against several tasks. Upon encountering a new task, we first warm-up a policy on our learned dynamical model, which requires no new samples from the environment. We then adapt the dynamical model with samples from this policy in the real environment. We evaluate our approach on several continuous control benchmarks and demonstrate its efficacy over MAML, a state-of-the-art meta-learning algorithm, on these tasks. △ Less

Submitted 3 November, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

Comments: 13 pages, 3 figures

arXiv:1903.09673 [pdf, other]

Compliance Sha** for Control of Strength Amplification Exoskeletons with Elastic Cuffs

Authors: Gray Cortright Thomas, Jeremiah M. Coholich, Luis Sentis

Abstract: Exoskeletons which amplify the strength of their operators can enable heavy-duty manipulation of unknown objects. However, this type of behavior is difficult to accomplish; it requires the exoskeleton to sense and amplify the operator's interaction forces while remaining stable. But, the goals of amplification and robust stability when connected to the operator fundamentally conflict. As a solutio… ▽ More Exoskeletons which amplify the strength of their operators can enable heavy-duty manipulation of unknown objects. However, this type of behavior is difficult to accomplish; it requires the exoskeleton to sense and amplify the operator's interaction forces while remaining stable. But, the goals of amplification and robust stability when connected to the operator fundamentally conflict. As a solution, we introduce a design with a spring in series with the force sensitive cuff. This allows us to design an exoskeleton compliance behavior which is nominally passive, even with high amplification ratios. In practice, time delay and discrete time filters prevent our strategy from actually achieving passivity, but the designed compliance still makes the exoskeleton more robust to spring-like human behaviors. Our exoskeleton is actuated by a series elastic actuator (SEA), which introduces another spring into the system. We show that sha** the cuff compliance for the exoskeleton can be made into approximately the same problem as sha** the spring compliance of an SEA. We therefore introduce a feedback controller and gain tuning method which takes advantage of an existing compliance sha** technique for SEAs. We call our strategy the "double compliance sha**" method. With large amplification ratios, this controller tends to amplify nonlinear transmission friction effects, so we additionally propose a "transmission disturbance observer" to mitigate this drawback. Our methods are validated on a single-degree-of-freedom elbow exoskeleton. △ Less

Submitted 22 March, 2019; originally announced March 2019.

Comments: 8 pages, 9 figures, conference

MSC Class: 70Q05; 70E60; 93C80

arXiv:1903.00704 [pdf, other]

doi 10.1109/IROS40897.2019.8968005

Complex Stiffness Model of Physical Human-Robot Interaction: Implications for Control of Performance Augmentation Exoskeletons

Authors: Binghan He, Huang Huang, Gray C. Thomas, Luis Sentis

Abstract: Human joint dynamic stiffness plays an important role in the stability of performance augmentation exoskeletons. In this paper, we consider a new frequency domain model of the human joint dynamics which features a complex value stiffness. This complex stiffness consists of a real stiffness and a hysteretic dam**. We use it to explain the dynamic behaviors of the human connected to the exoskeleto… ▽ More Human joint dynamic stiffness plays an important role in the stability of performance augmentation exoskeletons. In this paper, we consider a new frequency domain model of the human joint dynamics which features a complex value stiffness. This complex stiffness consists of a real stiffness and a hysteretic dam**. We use it to explain the dynamic behaviors of the human connected to the exoskeleton, in particular the observed non-zero low frequency phase shift and the near constant dam** ratio of the resonant as stiffness and inertia vary. We validate this concept by experimenting with an elbow-joint exoskeleton testbed on a subject while modifying joint stiffness behavior, exoskeleton inertia, and strength augmentation gains. We compare three different models of elbow-joint dynamic stiffness: a model with real stiffness, viscous dam** and inertia, a model with complex stiffness and inertia, and a model combining the previous two models. Our results show that the hysteretic dam** term improves modeling accuracy, using a statistical F-test. Moreover this improvement is statistically more significant than using classical viscous dam** term. In addition, we experimentally observe a linear relationship between the hysteretic dam** and the real part of the stiffness which allows us to simplify the complex stiffness model as a 1-parameter system. Ultimately, we design a fractional order controller to demonstrate how human hysteretic dam** behavior can be exploited to improve strength amplification performance while maintaining stability. △ Less

Submitted 30 April, 2020; v1 submitted 2 March, 2019; originally announced March 2019.

arXiv:1901.06261 [pdf, other]

NeuNetS: An Automated Synthesis Engine for Neural Network Design

Authors: Atin Sood, Benjamin Elder, Benjamin Herta, Chao Xue, Costas Bekas, A. Cristiano I. Malossi, Debashish Saha, Florian Scheidegger, Ganesh Venkataraman, Gegi Thomas, Giovanni Mariani, Hendrik Strobelt, Horst Samulowitz, Martin Wistuba, Matteo Manica, Mihir Choudhury, Rong Yan, Roxana Istrate, Ruchir Puri, Tejaswini Pedapati

Abstract: Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebui… ▽ More Application of neural networks to a vast variety of practical applications is transforming the way AI is applied in practice. Pre-trained neural network models available through APIs or capability to custom train pre-built neural network architectures with customer data has made the consumption of AI by developers much simpler and resulted in broad adoption of these complex AI models. While prebuilt network models exist for certain scenarios, to try and meet the constraints that are unique to each application, AI teams need to think about develo** custom neural network architectures that can meet the tradeoff between accuracy and memory footprint to achieve the tight constraints of their unique use-cases. However, only a small proportion of data science teams have the skills and experience needed to create a neural network from scratch, and the demand far exceeds the supply. In this paper, we present NeuNetS : An automated Neural Network Synthesis engine for custom neural network design that is available as part of IBM's AI OpenScale's product. NeuNetS is available for both Text and Image domains and can build neural networks for specific tasks in a fraction of the time it takes today with human effort, and with accuracy similar to that of human-designed AI models. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: 14 pages, 12 figures. arXiv admin note: text overlap with arXiv:1806.00250

arXiv:1812.01719 [pdf, other]

doi 10.3389/fninf.2019.00067

Knowing what you know in brain segmentation using Bayesian deep neural networks

Authors: Patrick McClure, Nao Rho, John A. Lee, Jakub R. Kaczmarzyk, Charles Zheng, Satrajit S. Ghosh, Dylan Nielson, Adam G. Thomas, Peter Bandettini, Francisco Pereira

Abstract: In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using… ▽ More In this paper, we describe a Bayesian deep neural network (DNN) for predicting FreeSurfer segmentations of structural MRI volumes, in minutes rather than hours. The network was trained and evaluated on a large dataset (n = 11,480), obtained by combining data from more than a hundred different sites, and also evaluated on another completely held-out dataset (n = 418). The network was trained using a novel spike-and-slab dropout-based variational inference approach. We show that, on these datasets, the proposed Bayesian DNN outperforms previously proposed methods, in terms of the similarity between the segmentation predictions and the FreeSurfer labels, and the usefulness of the estimate uncertainty of these predictions. In particular, we demonstrated that the prediction uncertainty of this network at each voxel is a good indicator of whether the network has made an error and that the uncertainty across the whole brain can predict the manual quality control ratings of a scan. The proposed Bayesian DNN method should be applicable to any new network architecture for addressing the segmentation problem. △ Less

Submitted 18 September, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

Comments: Submitted to Frontiers in Neuroinformatics

arXiv:1803.07635 [pdf, other]

Learning Robotic Assembly from CAD

Authors: Garrett Thomas, Melissa Chien, Aviv Tamar, Juan Aparicio Ojea, Pieter Abbeel

Abstract: In this work, motivated by recent manufacturing trends, we investigate autonomous robotic assembly. Industrial assembly tasks require contact-rich manipulation skills, which are challenging to acquire using classical control and motion planning approaches. Consequently, robot controllers for assembly domains are presently engineered to solve a particular task, and cannot easily handle variations i… ▽ More In this work, motivated by recent manufacturing trends, we investigate autonomous robotic assembly. Industrial assembly tasks require contact-rich manipulation skills, which are challenging to acquire using classical control and motion planning approaches. Consequently, robot controllers for assembly domains are presently engineered to solve a particular task, and cannot easily handle variations in the product or environment. Reinforcement learning (RL) is a promising approach for autonomously acquiring robot skills that involve contact-rich dynamics. However, RL relies on random exploration for learning a control policy, which requires many robot executions, and often gets trapped in locally suboptimal solutions. Instead, we posit that prior knowledge, when available, can improve RL performance. We exploit the fact that in modern assembly domains, geometric information about the task is readily available via the CAD design files. We propose to leverage this prior knowledge by guiding RL along a geometric motion plan, calculated using the CAD data. We show that our approach effectively improves over traditional control approaches for tracking the motion plan, and can solve assembly tasks that require high precision, even without accurate state estimation. In addition, we propose a neural network architecture that can learn to track the motion plan, and generalize the assembly controller to changes in the object positions. △ Less

Submitted 24 July, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

Comments: In the proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, May 2018

arXiv:1802.10190 [pdf, other]

Exploiting the Natural Dynamics of Series Elastic Robots by Actuator-Centered Sequential Linear Programming

Authors: Rachel Schlossman, Gray C. Thomas, Orion Campbell, Luis Sentis

Abstract: Series elastic robots are best able to follow trajectories which obey the limitations of their actuators, since they cannot instantly change their joint forces. In fact, the performance of series elastic actuators can surpass that of ideal force source actuators by storing and releasing energy. In this paper, we formulate the trajectory optimization problem for series elastic robots in a novel way… ▽ More Series elastic robots are best able to follow trajectories which obey the limitations of their actuators, since they cannot instantly change their joint forces. In fact, the performance of series elastic actuators can surpass that of ideal force source actuators by storing and releasing energy. In this paper, we formulate the trajectory optimization problem for series elastic robots in a novel way based on sequential linear programming. Our framework is unique in the separation of the actuator dynamics from the rest of the dynamics, and in the use of a tunable pseudo-mass parameter that improves the discretization accuracy of our approach. The actuator dynamics are truly linear, which allows them to be excluded from trust-region mechanics. This causes our algorithm to have similar run times with and without the actuator dynamics. We demonstrate our optimization algorithm by tuning high performance behaviors for a single-leg robot in simulation and on hardware for a single degree-of-freedom actuator testbed. The results show that compliance allows for faster motions and takes a similar amount of computation time. △ Less

Submitted 16 July, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

arXiv:1712.04989 [pdf, other]

Persistent Memory Programming Abstractions in Context of Concurrent Applications

Authors: Ajay Singh, Marc Shapiro, Gael Thomas

Abstract: The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This memory technology is considered to have the performance like that of DRAM and persistence like that of disks. Thus, it would also provide significant performanc… ▽ More The advent of non-volatile memory (NVM) technologies like PCM, STT, memristors and Fe-RAM is believed to enhance the system performance by getting rid of the traditional memory hierarchy by reducing the gap between memory and storage. This memory technology is considered to have the performance like that of DRAM and persistence like that of disks. Thus, it would also provide significant performance benefits for big data applications by allowing in-memory processing of large data with the lowest latency to persistence. Leveraging the performance benefits of this memory-centric computing technology through traditional memory programming is not trivial and the challenges aggravate for parallel/concurrent applications. To this end, several programming abstractions have been proposed like NVthreads, Mnemosyne and intel's NVML. However, deciding upon a programming abstraction which is easier to program and at the same time ensures the consistency and balances various software and architectural trade-offs is openly debatable and active area of research for NVM community. We study the NVthreads, Mnemosyne and NVML libraries by building a concurrent and persistent set and open addressed hash-table data structure application. In this process, we explore and report various tradeoffs and hidden costs involved in building concurrent applications for persistence in terms of achieving efficiency, consistency and ease of programming with these NVM programming abstractions. Eventually, we evaluate the performance of the set and hash-table data structure applications. We observe that NVML is easiest to program with but is least efficient and Mnemosyne is most performance friendly but involves significant programming efforts to build concurrent and persistent applications. △ Less

Submitted 13 December, 2017; originally announced December 2017.

Comments: Accepted in HiPC SRS 2017

arXiv:1609.09001 [pdf, other]

Learning from the Hindsight Plan -- Episodic MPC Improvement

Authors: Aviv Tamar, Garrett Thomas, Tianhao Zhang, Sergey Levine, Pieter Abbeel

Abstract: Model predictive control (MPC) is a popular control method that has proved effective for robotics, among other fields. MPC performs re-planning at every time step. Re-planning is done with a limited horizon per computational and real-time constraints and often also for robustness to potential model errors. However, the limited horizon leads to suboptimal performance. In this work, we consider the… ▽ More Model predictive control (MPC) is a popular control method that has proved effective for robotics, among other fields. MPC performs re-planning at every time step. Re-planning is done with a limited horizon per computational and real-time constraints and often also for robustness to potential model errors. However, the limited horizon leads to suboptimal performance. In this work, we consider the iterative learning setting, where the same task can be repeated several times, and propose a policy improvement scheme for MPC. The main idea is that between executions we can, offline, run MPC with a longer horizon, resulting in a hindsight plan. To bring the next real-world execution closer to the hindsight plan, our approach learns to re-shape the original cost function with the goal of satisfying the following property: short horizon planning (as realistic during real executions) with respect to the shaped cost should result in mimicking the hindsight plan. This effectively consolidates long-term reasoning into the short-horizon planning. We empirically evaluate our approach in contact-rich manipulation tasks both in simulated and real environments, such as peg insertion by a real PR2 robot. △ Less

Submitted 20 March, 2017; v1 submitted 28 September, 2016; originally announced September 2016.

Comments: Additional experiments for neural network generalization and for varying the planning horizon. Paper accepted to ICRA 2017

arXiv:1602.02867 [pdf, other]

Value Iteration Networks

Authors: Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, Pieter Abbeel

Abstract: We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a… ▽ More We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains. △ Less

Submitted 20 March, 2017; v1 submitted 9 February, 2016; originally announced February 2016.

Comments: Fixed missing table values

Journal ref: Advances in Neural Information Processing Systems 29 pages 2154--2162, 2016

arXiv:1501.02855 [pdf, other]

Assessing Whole-Body Operational Space Control in a Point-Foot Series Elastic Biped: Balance on Split Terrain and Undirected Walking

Authors: Donghyun Kim, Ye Zhao, Gray Thomas, Luis Sentis

Abstract: In this paper we present advancements in control and trajectory generation for agile behavior in bipedal robots. We demonstrate that Whole-Body Operational Space Control (WBOSC), developed a few years ago, is well suited for achieving two types of agile behaviors, namely, balancing on a high pitch split terrain and achieving undirected walking on flat terrain. The work presented here is the first… ▽ More In this paper we present advancements in control and trajectory generation for agile behavior in bipedal robots. We demonstrate that Whole-Body Operational Space Control (WBOSC), developed a few years ago, is well suited for achieving two types of agile behaviors, namely, balancing on a high pitch split terrain and achieving undirected walking on flat terrain. The work presented here is the first implementation of WBOSC on a biped robot, and more specifically a biped robot with series elastic actuators. We present and analyze a new algorithm that dynamically balances point foot robots by choosing footstep placements. Dealing with the naturally unstable dynamics of these type of systems is a difficult problem that requires both the controller and the trajectory generation algorithm to operate quickly and efficiently. We put forth a comprehensive development and integration effort: the design and construction of the biped system and experimental infrastructure, a customization of WBOSC for the agile behaviors, and new trajectory generation algorithms. Using this custom built controller, we conduct, for first time, an experiment in which a biped robot balances in a high pitch split terrain, demonstrating our ability to precisely regulate internal forces using force sensing feedback techniques. Finally, we demonstrate the stabilizing capabilities of our online trajectory generation algorithm in the physics-based simulator and through physical experiments with a planarized locomotion setup. △ Less

Submitted 12 January, 2015; originally announced January 2015.

Comments: 17 pages, 9 figures, 4 tables

arXiv:1407.4346 [pdf, ps, other]

doi 10.1145/2619090

Faults in Linux 2.6

Authors: Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller, Julia L. Lawall

Abstract: In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code… ▽ More In August 2011, Linux entered its third decade. Ten years before, Chou et al. published a study of faults found by applying a static analyzer to Linux versions 1.0 through 2.4.1. A major result of their work was that the drivers directory contained up to 7 times more of certain kinds of faults than other directories. This result inspired numerous efforts on improving the reliability of driver code. Today, Linux is used in a wider range of environments, provides a wider range of services, and has adopted a new development and release model. What has been the impact of these changes on code quality? To answer this question, we have transported Chou et al.'s experiments to all versions of Linux 2.6; released between 2003 and 2011. We find that Linux has more than doubled in size during this period, but the number of faults per line of code has been decreasing. Moreover, the fault rate of drivers is now below that of other directories, such as arch. These results can guide further development and research efforts for the decade to come. To allow updating these results as Linux evolves, we define our experimental protocol and make our checkers available. △ Less

Submitted 16 July, 2014; originally announced July 2014.

Journal ref: ACM Transactions on Computer Systems 32, 2 (2014) 1--40

arXiv:1310.8392 [pdf]

Cloud computing security using encryption technique

Authors: Geethu Thomas, Prem Jose V, P. Afsar

Abstract: Cloud Computing has been envisioned as the next generation architecture of IT Enterprise. The Cloud computing concept offers dynamically scalable resources provisioned as a service over the Internet. Economic benefits are the main driver for the Cloud, since it promises the reduction of capital expenditure and operational expenditure. In order for this to become reality, however, there are still s… ▽ More Cloud Computing has been envisioned as the next generation architecture of IT Enterprise. The Cloud computing concept offers dynamically scalable resources provisioned as a service over the Internet. Economic benefits are the main driver for the Cloud, since it promises the reduction of capital expenditure and operational expenditure. In order for this to become reality, however, there are still some challenges to be solved. Most important among these are security and trust issues,since the users data has to be released to the Cloud and thus leaves the protection sphere of the data owner.In contrast to traditional solutions, where the IT services are under proper physical,logical and personnel controls, Cloud Computing moves the application software and databases to the large data centers, where the management of the data and services may not be fully trustworthy. This unique attribute, however, poses many new security challenges which have not been well understood. Security is to save data from danger and vulnerability. There are so many dangers and vulnerabilities to be handled. Various security issues and some of their solution are explained and are concentrating mainly on public cloud security issues and their solutions. Data should always be encrypted when stored(using separate symmetric encryption keys)and transmitted. If this is implemented appropriately, even if another tenant can access the data, all that will appear is gibberish. So a method is proposed such that we are encrypting the whole data along with the cryptographic key. △ Less

Submitted 31 October, 2013; originally announced October 2013.

Comments: 7 Pages, 3 Figures. arXiv admin note: text overlap with arXiv:1303.4814 by other authors

arXiv:0904.4058 [pdf, other]

Security impact ratings considered harmful

Authors: Jeff Arnold, Tim Abbott, Waseem Daher, Gregory Price, Nelson Elhage, Geoffrey Thomas, Anders Kaseorg

Abstract: In this paper, we question the common practice of assigning security impact ratings to OS updates. Specifically, we present evidence that ranking updates by their perceived security importance, in order to defer applying some updates, exposes systems to significant risk. We argue that OS vendors and security groups should not focus on security updates to the detriment of other updates, but sho… ▽ More In this paper, we question the common practice of assigning security impact ratings to OS updates. Specifically, we present evidence that ranking updates by their perceived security importance, in order to defer applying some updates, exposes systems to significant risk. We argue that OS vendors and security groups should not focus on security updates to the detriment of other updates, but should instead seek update technologies that make it feasible to distribute updates for all disclosed OS bugs in a timely manner. △ Less

Submitted 26 April, 2009; originally announced April 2009.

Comments: HotOS 2009

arXiv:0802.3475 [pdf]

Spreadsheet Development Methodologies using Resolver: Moving spreadsheets into the 21st Century

Authors: Patrick Kemmis, Giles Thomas

Abstract: We intend to demonstrate the innate problems with existing spreadsheet products and to show how to tackle these issues using a new type of spreadsheet program called Resolver. It addresses the issues head-on and thereby moves the 1980's "VisiCalc paradigm" on to match the advances in computer languages and user requirements. Continuous display of the spreadsheet grid and the equivalent computer… ▽ More We intend to demonstrate the innate problems with existing spreadsheet products and to show how to tackle these issues using a new type of spreadsheet program called Resolver. It addresses the issues head-on and thereby moves the 1980's "VisiCalc paradigm" on to match the advances in computer languages and user requirements. Continuous display of the spreadsheet grid and the equivalent computer program, together with the ability to interact and add code through either interface, provides a number of new methodologies for spreadsheet development. △ Less

Submitted 23 February, 2008; originally announced February 2008.

Comments: 12 pages

ACM Class: J.1; H.4.1; K.6.4; D.2.9

Journal ref: Proc. European Spreadsheet Risks Int. Grp. (EuSpRIG) 2007 93-104 ISBN 978-905617-58-6

arXiv:cs/0411081 [pdf, ps, other]

Reconfigurations dynamiques de services dans un intergiciel a composants CORBA CCM

Authors: Assia Hachichi, Cyril Martin, Gael Thomas, Simon Patarin, Bertil Folliot

Abstract: Today, component oriented middlewares are used to design, develop and deploy easily distributed applications, by ensuring the heterogeneity, interoperability, and reuse of the software modules, and the separation between the business code encapsulated in the components and the system code managed by the containers. Several standards answer this definition such as: CCM (CORBA Component Model), EJ… ▽ More Today, component oriented middlewares are used to design, develop and deploy easily distributed applications, by ensuring the heterogeneity, interoperability, and reuse of the software modules, and the separation between the business code encapsulated in the components and the system code managed by the containers. Several standards answer this definition such as: CCM (CORBA Component Model), EJB (Enterprise Java Beans) and .Net. However these standards offer a limited and fixed number of system services, removing any possibility to add system services or to reconfigure dynamically the middleware. Our works propose mechanisms to add and to adapt dynamically the system services, based on a reconfiguration language which is dynamically adaptable to the need of the reconfiguration, and on a tool of dynamic reconfiguration, a prototype was achieved for the OpenCCM platform, that is an implementation of the CCM specification. This work was partially financed by the european project IST-COACH (2001-34445). △ Less

Submitted 24 November, 2004; originally announced November 2004.

Journal ref: DECOR04 (2004) 159-170

arXiv:cmp-lg/9801001 [pdf, ps, other]

Hierarchical Non-Emitting Markov Models

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitt… ▽ More We describe a simple variant of the interpolated Markov model with non-emitting state transitions and prove that it is strictly more powerful than any Markov model. More importantly, the non-emitting model outperforms the classic interpolated model on the natural language texts under a wide range of experimental conditions, with only a modest increase in computational requirements. The non-emitting model is also much less prone to overfitting. Keywords: Markov model, interpolated Markov model, hidden Markov model, mixture modeling, non-emitting state transitions, state-conditional interpolation, statistical language model, discrete time series, Brown corpus, Wall Street Journal. △ Less

Submitted 20 January, 1998; v1 submitted 14 January, 1998; originally announced January 1998.

Comments: http://www.cs.princeton.edu/~ristad/papers/pu-544-97.ps.gz

Report number: CS-TR-544-97

arXiv:cmp-lg/9611004 [pdf, ps, other]

Nonuniform Markov models

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model… ▽ More A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model. This result is somewhat remarkable because both models contain identical numbers of parameters whose values are estimated in a similar manner. The only difference between the two models is how they combine the statistics of longer and shorter strings. Keywords: nonuniform Markov model, interpolated Markov model, conditional independence, statistical language model, discrete time series. △ Less

Submitted 16 November, 1996; originally announced November 1996.

Comments: 17 pages

Report number: CS-TR-536-96

arXiv:cmp-lg/9505002 [pdf, ps]

New Techniques for Context Modeling

Authors: Eric Sven Ristad, Robert G. Thomas

Abstract: We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast,… ▽ More We introduce three new techniques for statistical language models: extension modeling, nonmonotonic contexts, and the divergence heuristic. Together these techniques result in language models that have few states, even fewer parameters, and low message entropies. For example, our techniques achieve a message entropy of 1.97 bits/char on the Brown corpus using only 89,325 parameters. In contrast, the character 4-gram model requires more than 250 times as many parameters in order to achieve a message entropy of only 2.47 bits/char. The fact that our model performs significantly better while using vastly fewer parameters indicates that it is a better probability model of natural language text. △ Less

Submitted 1 May, 1995; originally announced May 1995.

Comments: 8 pages, to appear in Proc. ACL 1995

Showing 1–48 of 48 results for author: Thomas, G