Search | arXiv e-print repository

Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs

Authors: Bilgehan Sel, Priya Shanmugasundaram, Mohammad Kachuee, Kun Zhou, Ruoxi Jia, Ming **

Abstract: Large Language Models (LLMs) have shown remarkable capabilities in tasks such as summarization, arithmetic reasoning, and question answering. However, they encounter significant challenges in the domain of moral reasoning and ethical decision-making, especially in complex scenarios with multiple stakeholders. This paper introduces the Skin-in-the-Game (SKIG) framework, aimed at enhancing moral rea… ▽ More Large Language Models (LLMs) have shown remarkable capabilities in tasks such as summarization, arithmetic reasoning, and question answering. However, they encounter significant challenges in the domain of moral reasoning and ethical decision-making, especially in complex scenarios with multiple stakeholders. This paper introduces the Skin-in-the-Game (SKIG) framework, aimed at enhancing moral reasoning in LLMs by exploring decisions' consequences from multiple stakeholder perspectives. Central to SKIG's mechanism is simulating accountability for actions, which, alongside empathy exercises and risk assessment, is pivotal to its effectiveness. We validate SKIG's performance across various moral reasoning benchmarks with proprietary and opensource LLMs, and investigate its crucial components through extensive ablation analyses. △ Less

Submitted 2 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: ACL 2024, long paper

arXiv:2402.08968 [pdf, other]

GrounDial: Human-norm Grounded Safe Dialog Response Generation

Authors: Siwon Kim, Shuyang Dai, Mohammad Kachuee, Shayan Ray, Tara Taghavi, Sungroh Yoon

Abstract: Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propos… ▽ More Current conversational AI systems based on large language models (LLMs) are known to generate unsafe responses, agreeing to offensive user input or including toxic content. Previous research aimed to alleviate the toxicity, by fine-tuning LLM with manually annotated safe dialogue histories. However, the dependency on additional tuning requires substantial costs. To remove the dependency, we propose GrounDial, where response safety is achieved by grounding responses to commonsense social rules without requiring fine-tuning. A hybrid approach of in-context learning and human-norm-guided decoding of GrounDial enables the response to be quantitatively and qualitatively safer even without additional data or tuning. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted to findings of EACL 2024

arXiv:2306.04823 [pdf, other]

Data Augmentation for Improving Tail-traffic Robustness in Skill-routing for Dialogue Systems

Authors: Ting-Wei Wu, Fatemeh Sheikholeslami, Mohammad Kachuee, Jaeyoung Do, Sung** Lee

Abstract: Large-scale conversational systems typically rely on a skill-routing component to route a user request to an appropriate skill and interpretation to serve the request. In such system, the agent is responsible for serving thousands of skills and interpretations which create a long-tail distribution due to the natural frequency of requests. For example, the samples related to play music might be a t… ▽ More Large-scale conversational systems typically rely on a skill-routing component to route a user request to an appropriate skill and interpretation to serve the request. In such system, the agent is responsible for serving thousands of skills and interpretations which create a long-tail distribution due to the natural frequency of requests. For example, the samples related to play music might be a thousand times more frequent than those asking for theatre show times. Moreover, inputs used for ML-based skill routing are often a heterogeneous mix of strings, embedding vectors, categorical and scalar features which makes employing augmentation-based long-tail learning approaches challenging. To improve the skill-routing robustness, we propose an augmentation of heterogeneous skill-routing data and training targeted for robust operation in long-tail data regimes. We explore a variety of conditional encoder-decoder generative frameworks to perturb original data fields and create synthetic training data. To demonstrate the effectiveness of the proposed method, we conduct extensive experiments using real-world data from a commercial conversational system. Based on the experiment results, the proposed approach improves more than 80% (51 out of 63) of intents with less than 10K of traffic instances in the skill-routing replication task. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2305.10528 [pdf, other]

Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems

Authors: Sarthak Ahuja, Mohammad Kachuee, Fateme Sheikholeslami, Weiqing Liu, Jaeyoung Do

Abstract: Off-Policy reinforcement learning has been a driving force for the state-of-the-art conversational AIs leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such… ▽ More Off-Policy reinforcement learning has been a driving force for the state-of-the-art conversational AIs leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents. However, in large-scale commercial settings, it is often challenging to balance between policy improvements and experience continuity on the broad spectrum of applications handled by such system. In the literature, off-policy evaluation and guard-railing on aggregate statistics has been commonly used to address this problem. In this paper, we propose a method for curating and leveraging high-precision samples sourced from historical regression incident reports to validate, safe-guard, and improve policies prior to the online deployment. We conducted extensive experiments using data from a real-world conversational system and actual regression incidents. The proposed method is currently deployed in our production system to protect customers against broken experiences and enable long-term policy improvements. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023 Industry Track

arXiv:2209.08429 [pdf, other]

Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems

Authors: Mohammad Kachuee, Sung** Lee

Abstract: Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable… ▽ More Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shop**, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Report number: ACL 2023

arXiv:2204.07135 [pdf, other]

Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems

Authors: Mohammad Kachuee, **seok Nam, Sarthak Ahuja, **-Myung Won, Sung** Lee

Abstract: Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based… ▽ More Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: NAACL 2022

arXiv:2204.01916 [pdf, other]

Domain-Aware Contrastive Knowledge Transfer for Multi-domain Imbalanced Data

Authors: Zixuan Ke, Mohammad Kachuee, Sung** Lee

Abstract: In many real-world machine learning applications, samples belong to a set of domains e.g., for product reviews each review belongs to a product category. In this paper, we study multi-domain imbalanced learning (MIL), the scenario that there is imbalance not only in classes but also in domains. In the MIL setting, different domains exhibit different patterns and there is a varying degree of simila… ▽ More In many real-world machine learning applications, samples belong to a set of domains e.g., for product reviews each review belongs to a product category. In this paper, we study multi-domain imbalanced learning (MIL), the scenario that there is imbalance not only in classes but also in domains. In the MIL setting, different domains exhibit different patterns and there is a varying degree of similarity and divergence among domains posing opportunities and challenges for transfer learning especially when faced with limited or insufficient training data. We propose a novel domain-aware contrastive knowledge transfer method called DCMI to (1) identify the shared domain knowledge to encourage positive transfer among similar domains (in particular from head domains to tail domains); (2) isolate the domain-specific knowledge to minimize the negative transfer from dissimilar domains. We evaluated the performance of DCMI on three different datasets showing significant improvements in different MIL scenarios. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: ACL WASSA 2022

arXiv:2011.05961 [pdf, other]

Real-Time Decentralized knowledge Transfer at the Edge

Authors: Orpaz Goldstein, Mohammad Kachuee, Derek Shiell, Majid Sarrafzadeh

Abstract: The proliferation of edge networks creates islands of learning agents working on local streams of data. Transferring knowledge between these agents in real-time without exposing private data allows for collaboration to decrease learning time and increase model confidence. Incorporating knowledge from data that a local model did not see creates an ability to debias a local model or add to classific… ▽ More The proliferation of edge networks creates islands of learning agents working on local streams of data. Transferring knowledge between these agents in real-time without exposing private data allows for collaboration to decrease learning time and increase model confidence. Incorporating knowledge from data that a local model did not see creates an ability to debias a local model or add to classification abilities on data never before seen. Transferring knowledge in a selective decentralized approach enables models to retain their local insights, allowing for local flavors of a machine learning model. This approach suits the decentralized architecture of edge networks, as a local edge node will serve a community of learning agents that will likely encounter similar data. We propose a method based on knowledge distillation for pairwise knowledge transfer pipelines from models trained on non-i.i.d. data and compare it to other popular knowledge transfer methods. Additionally, we test different scenarios of knowledge transfer network construction and show the practicality of our approach. Our experiments show knowledge transfer using our model outperforms standard methods in a real-time transfer scenario. △ Less

Submitted 1 October, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2010.11230 [pdf, other]

Self-Supervised Contrastive Learning for Efficient User Satisfaction Prediction in Conversational Agents

Authors: Mohammad Kachuee, Hao Yuan, Young-Bum Kim, Sung** Lee

Abstract: Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agent's performance and provide insights about defective user experiences. Moreover, a powerful satisfaction model can be used as an objective function that a conversational agent continuously optimizes for. While end-to-end deep learning has shown promising result… ▽ More Turn-level user satisfaction is one of the most important performance metrics for conversational agents. It can be used to monitor the agent's performance and provide insights about defective user experiences. Moreover, a powerful satisfaction model can be used as an objective function that a conversational agent continuously optimizes for. While end-to-end deep learning has shown promising results, having access to a large number of reliable annotated samples required by these methods remains challenging. In a large-scale conversational system, there is a growing number of newly developed skills, making the traditional data collection, annotation, and modeling process impractical due to the required annotation costs as well as the turnaround times. In this paper, we suggest a self-supervised contrastive learning approach that leverages the pool of unlabeled data to learn user-agent interactions. We show that the pre-trained models using the self-supervised objective are transferable to the user satisfaction prediction. In addition, we propose a novel few-shot transfer learning approach that ensures better transferability for very small sample sizes. The suggested few-shot method does not require any inner loop optimization process and is scalable to very large datasets and complex models. Based on our experiments using real-world data from a large-scale commercial system, the suggested approach is able to significantly reduce the required number of annotations, while improving the generalization on unseen out-of-domain skills. △ Less

Submitted 11 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: NAACL-HLT 2021

arXiv:1912.09600 [pdf, other]

Group-Connected Multilayer Perceptron Networks

Authors: Mohammad Kachuee, Sajad Darabi, Shayan Fazeli, Majid Sarrafzadeh

Abstract: Despite the success of deep learning in domains such as image, voice, and graphs, there has been little progress in deep representation learning for domains without a known structure between features. For instance, a tabular dataset of different demographic and clinical factors where the feature interactions are not given as a prior. In this paper, we propose Group-Connected Multilayer Perceptron… ▽ More Despite the success of deep learning in domains such as image, voice, and graphs, there has been little progress in deep representation learning for domains without a known structure between features. For instance, a tabular dataset of different demographic and clinical factors where the feature interactions are not given as a prior. In this paper, we propose Group-Connected Multilayer Perceptron (GMLP) networks to enable deep representation learning in these domains. GMLP is based on the idea of learning expressive feature combinations (groups) and exploiting them to reduce the network complexity by defining local group-wise operations. During the training phase, GMLP learns a sparse feature grou** matrix using temperature annealing softmax with an added entropy loss term to encourage the sparsity. Furthermore, an architecture is suggested which resembles binary trees, where group-wise operations are followed by pooling operations to combine information; reducing the number of groups as the network grows in depth. To evaluate the proposed method, we conducted experiments on different real-world datasets covering various application areas. Additionally, we provide visualizations on MNIST and synthesized data. According to the results, GMLP is able to successfully learn and exploit expressive feature combinations and achieve state-of-the-art classification performance on different datasets. △ Less

Submitted 25 November, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

arXiv:1912.08281 [pdf, other]

Cost-Sensitive Feature-Value Acquisition Using Feature Relevance

Authors: Kimmo Kärkkäinen, Mohammad Kachuee, Orpaz Goldstein, Majid Sarrafzadeh

Abstract: In many real-world machine learning problems, feature values are not readily available. To make predictions, some of the missing features have to be acquired, which can incur a cost in money, computational time, or human time, depending on the problem domain. This leads us to the problem of choosing which features to use at the prediction time. The chosen features should increase the prediction ac… ▽ More In many real-world machine learning problems, feature values are not readily available. To make predictions, some of the missing features have to be acquired, which can incur a cost in money, computational time, or human time, depending on the problem domain. This leads us to the problem of choosing which features to use at the prediction time. The chosen features should increase the prediction accuracy for a low cost, but determining which features will do that is challenging. The choice should take into account the previously acquired feature values as well as the feature costs. This paper proposes a novel approach to address this problem. The proposed approach chooses the most useful features adaptively based on how relevant they are for the prediction task as well as what the corresponding feature costs are. Our approach uses a generic neural network architecture, which is suitable for a wide range of problems. We evaluate our approach on three cost-sensitive datasets, including Yahoo! Learning to Rank Competition dataset as well as two health datasets. We show that our approach achieves high accuracy with a lower cost than the current state-of-the-art approaches. △ Less

Submitted 18 December, 2019; v1 submitted 17 December, 2019; originally announced December 2019.

arXiv:1910.01803 [pdf, other]

Unsupervised Representation for EHR Signals and Codes as Patient Status Vector

Authors: Sajad Darabi, Mohammad Kachuee, Majid Sarrafzadeh

Abstract: Effective modeling of electronic health records presents many challenges as they contain large amounts of irregularity most of which are due to the varying procedures and diagnosis a patient may have. Despite the recent progress in machine learning, unsupervised learning remains largely at open, especially in the healthcare domain. In this work, we present a two-step unsupervised representation le… ▽ More Effective modeling of electronic health records presents many challenges as they contain large amounts of irregularity most of which are due to the varying procedures and diagnosis a patient may have. Despite the recent progress in machine learning, unsupervised learning remains largely at open, especially in the healthcare domain. In this work, we present a two-step unsupervised representation learning scheme to summarize the multi-modal clinical time series consisting of signals and medical codes into a patient status vector. First, an auto-encoder step is used to reduce sparse medical codes and clinical time series into a distributed representation. Subsequently, the concatenation of the distributed representations is further fine-tuned using a forecasting task. We evaluate the usefulness of the representation on two downstream tasks: mortality and readmission. Our proposed method shows improved generalization performance for both short duration ICU visits and long duration ICU visits. △ Less

Submitted 4 October, 2019; originally announced October 2019.

arXiv:1909.06772 [pdf, other]

Target-Focused Feature Selection Using a Bayesian Approach

Authors: Orpaz Goldstein, Mohammad Kachuee, Kimmo Karkkainen, Majid Sarrafzadeh

Abstract: In many real-world scenarios where data is high dimensional, test time acquisition of features is a non-trivial task due to costs associated with feature acquisition and evaluating feature value. The need for highly confident models with an extremely frugal acquisition of features can be addressed by allowing a feature selection method to become target aware. We introduce an approach to feature se… ▽ More In many real-world scenarios where data is high dimensional, test time acquisition of features is a non-trivial task due to costs associated with feature acquisition and evaluating feature value. The need for highly confident models with an extremely frugal acquisition of features can be addressed by allowing a feature selection method to become target aware. We introduce an approach to feature selection that is based on Bayesian learning, allowing us to report target-specific levels of uncertainty, false positive, and false negative rates. In addition, measuring uncertainty lifts the restriction on feature selection being target agnostic, allowing for feature acquisition based on a single target of focus out of many. We show that acquiring features for a specific target is at least as good as common linear feature selection approaches for small non-sparse datasets, and surpasses these when faced with real-world healthcare data that is larger in scale and in sparseness. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1908.03971 [pdf, other]

TAPER: Time-Aware Patient EHR Representation

Authors: Sajad Darabi, Mohammad Kachuee, Shayan Fazeli, Majid Sarrafzadeh

Abstract: Effective representation learning of electronic health records is a challenging task and is becoming more important as the availability of such data is becoming pervasive. The data contained in these records are irregular and contain multiple modalities such as notes, and medical codes. They are preempted by medical conditions the patient may have, and are typically jotted down by medical staff. A… ▽ More Effective representation learning of electronic health records is a challenging task and is becoming more important as the availability of such data is becoming pervasive. The data contained in these records are irregular and contain multiple modalities such as notes, and medical codes. They are preempted by medical conditions the patient may have, and are typically jotted down by medical staff. Accompanying codes are notes containing valuable information about patients beyond the structured information contained in electronic health records. We use transformer networks and the recently proposed BERT language model to embed these data streams into a unified vector representation. The presented approach effectively encodes a patient's visit data into a single distributed representation, which can be used for downstream tasks. Our model demonstrates superior performance and generalization on mortality, readmission and length of stay tasks using the publicly available MIMIC-III ICU dataset. Code avaialble at https://github.com/sajaddarabi/TAPER-EHR △ Less

Submitted 3 May, 2020; v1 submitted 11 August, 2019; originally announced August 2019.

arXiv:1905.09340 [pdf, other]

Generative Imputation and Stochastic Prediction

Authors: Mohammad Kachuee, Kimmo Karkkainen, Orpaz Goldstein, Sajad Darabi, Majid Sarrafzadeh

Abstract: In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is synonymous with uncertainties not only over the distribution of missing values but also over target class assignments that require careful consideration. In this paper,… ▽ More In many machine learning applications, we are faced with incomplete datasets. In the literature, missing data imputation techniques have been mostly concerned with filling missing values. However, the existence of missing values is synonymous with uncertainties not only over the distribution of missing values but also over target class assignments that require careful consideration. In this paper, we propose a simple and effective method for imputing missing features and estimating the distribution of target assignments given incomplete data. In order to make imputations, we train a simple and effective generator network to generate imputations that a discriminator network is tasked to distinguish. Following this, a predictor network is trained using the imputed samples from the generator network to capture the classification uncertainties and make predictions accordingly. The proposed method is evaluated on CIFAR-10 and MNIST image datasets as well as five real-world tabular classification datasets, under different missingness rates and structures. Our experimental results show the effectiveness of the proposed method in generating imputations as well as providing estimates for the class uncertainties in a classification task when faced with missing values. △ Less

Submitted 4 September, 2020; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1905.02312 [pdf, other]

doi 10.1109/ISCAS.2017.8050240

Non-invasive Blood Pressure Estimation Using Phonocardiogram

Authors: Amirhossein Esmaili, Mohammad Kachuee, Mahdi Shabany

Abstract: This paper presents a novel approach based on pulse transit time (PTT) for the estimation of blood pressure (BP). In order to achieve this goal, a data acquisition hardware is designed for high-resolution sampling of phonocardiogram (PCG) and photoplethysmogram (PPG). These two signals can derive PTT values. Meanwhile, a force-sensing resistor (FSR) is placed under the cuff of the BP reference dev… ▽ More This paper presents a novel approach based on pulse transit time (PTT) for the estimation of blood pressure (BP). In order to achieve this goal, a data acquisition hardware is designed for high-resolution sampling of phonocardiogram (PCG) and photoplethysmogram (PPG). These two signals can derive PTT values. Meanwhile, a force-sensing resistor (FSR) is placed under the cuff of the BP reference device to mark the moments of measurements accurately via recording instantaneous cuff pressure. For deriving the PTT-BP models, a calibration procedure including a supervised physical exercise is conducted for each individual. The proposed method is evaluated on 24 subjects. The final results prove that using PCG for PTT measurement alongside the proposed models, the BP can be estimated reliably. Since the use of PCG requires a minimal low-cost hardware, the proposed method enables ubiquitous BP estimation in portable healthcare devices. △ Less

Submitted 6 May, 2019; originally announced May 2019.

Comments: The collected data set can be accessed using the following url link: http://www.kaggle.com/mkachuee/noninvasivebp

Journal ref: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-4. IEEE, 2017

arXiv:1902.07102 [pdf, other]

Cost-Sensitive Diagnosis and Learning Leveraging Public Health Data

Authors: Mohammad Kachuee, Kimmo Karkkainen, Orpaz Goldstein, Davina Zamanzadeh, Majid Sarrafzadeh

Abstract: Traditionally, machine learning algorithms rely on the assumption that all features of a given dataset are available for free. However, there are many concerns such as monetary data collection costs, patient discomfort in medical procedures, and privacy impacts of data collection that require careful consideration in any real-world health analytics system. An efficient solution would only acquire… ▽ More Traditionally, machine learning algorithms rely on the assumption that all features of a given dataset are available for free. However, there are many concerns such as monetary data collection costs, patient discomfort in medical procedures, and privacy impacts of data collection that require careful consideration in any real-world health analytics system. An efficient solution would only acquire a subset of features based on the value it provides while considering acquisition costs. Moreover, datasets that provide feature costs are very limited, especially in healthcare. In this paper, we provide a health dataset as well as a method for assigning feature costs based on the total level of inconvenience asking for each feature entails. Furthermore, based on the suggested dataset, we provide a comparison of recent and state-of-the-art approaches to cost-sensitive feature acquisition and learning. Specifically, we analyze the performance of major sensitivity-based and reinforcement learning based methods in the literature on three different problems in the health domain, including diabetes, heart disease, and hypertension classification. △ Less

Submitted 30 June, 2019; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1901.00243 [pdf, other]

Opportunistic Learning: Budgeted Cost-Sensitive Learning from Data Streams

Authors: Mohammad Kachuee, Orpaz Goldstein, Kimmo Karkkainen, Sajad Darabi, Majid Sarrafzadeh

Abstract: In many real-world learning scenarios, features are only acquirable at a cost constrained under a budget. In this paper, we propose a novel approach for cost-sensitive feature acquisition at the prediction-time. The suggested method acquires features incrementally based on a context-aware feature-value function. We formulate the problem in the reinforcement learning paradigm, and introduce a rewar… ▽ More In many real-world learning scenarios, features are only acquirable at a cost constrained under a budget. In this paper, we propose a novel approach for cost-sensitive feature acquisition at the prediction-time. The suggested method acquires features incrementally based on a context-aware feature-value function. We formulate the problem in the reinforcement learning paradigm, and introduce a reward function based on the utility of each feature. Specifically, MC dropout sampling is used to measure expected variations of the model uncertainty which is used as a feature-value function. Furthermore, we suggest sharing representations between the class predictor and value function estimator networks. The suggested approach is completely online and is readily applicable to stream learning setups. The solution is evaluated on three different datasets including the well-known MNIST dataset as a benchmark as well as two cost-sensitive datasets: Yahoo Learning to Rank and a dataset in the medical domain for diabetes classification. According to the results, the proposed method is able to efficiently acquire features and make accurate predictions. △ Less

Submitted 17 February, 2019; v1 submitted 1 January, 2019; originally announced January 2019.

Comments: https://openreview.net/forum?id=S1eOHo09KX

Journal ref: International Conference on Learning Representations (ICLR), 2019

arXiv:1812.00544 [pdf, other]

doi 10.1109/TIM.2017.2745081

Nonlinear Cuff-less Blood Pressure Estimation of Healthy Subjects Using Pulse Transit Time and Arrival Time

Authors: Amirhossein Esmaili, Mohammad Kachuee, Mahdi Shabany

Abstract: This paper presents a novel blood pressure (BP) estimation method based on pulse transit time (PTT) and pulse arrival time (PAT) to estimate the systolic BP (SBP) and the diastolic BP (DBP). A data acquisition hardware is designed for high-resolution sampling of phonocardiogram (PCG), photoplethysmogram, and electrocardiogram (ECG). PCG and ECG perform as the proximal timing reference to obtain PT… ▽ More This paper presents a novel blood pressure (BP) estimation method based on pulse transit time (PTT) and pulse arrival time (PAT) to estimate the systolic BP (SBP) and the diastolic BP (DBP). A data acquisition hardware is designed for high-resolution sampling of phonocardiogram (PCG), photoplethysmogram, and electrocardiogram (ECG). PCG and ECG perform as the proximal timing reference to obtain PTT and PAT indices, respectively. In order to derive a BP estimator model, a calibration procedure, including a supervised physical exercise, is conducted for each individual, which causes changes in their BP, and then, a number of reference BPs are measured alongside the acquisition of the signals per subject. It is suggested to use a force-sensing resistor that is placed under the cuff of the BP reference device to mark the exact moments of reference BP measurements, which are corresponding to the inflation of the cuff. Additionally, a novel BP estimator nonlinear model, based on the theory of elastic tubes, is introduced to estimate the BP using PTT/PAT values precisely. The proposed method is evaluated on 32 subjects. Using the PTT index, the correlation coefficients for SBP and DBP estimation are 0.89 and 0.84, respectively. Using the PAT index, the correlation coefficients for SBP and DBP estimation are 0.95 and 0.84, respectively. The results show that the proposed method, exploiting the introduced nonlinear model with the use of PAT index or PTT index, provides a reliable estimation of SBP and DBP. △ Less

Submitted 6 May, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

Comments: The collected data set can be accessed using the following url link: http://www.kaggle.com/mkachuee/noninvasivebp

Journal ref: IEEE Transactions on Instrumentation and Measurement, 66(12), pp.3299-3308, December 2017

arXiv:1811.01249 [pdf, other]

doi 10.1109/TNNLS.2018.2880403

Dynamic Feature Acquisition Using Denoising Autoencoders

Authors: Mohammad Kachuee, Sajad Darabi, Babak Moatamed, Majid Sarrafzadeh

Abstract: In real-world scenarios, different features have different acquisition costs at test-time which necessitates cost-aware methods to optimize the cost and performance trade-off. This paper introduces a novel and scalable approach for cost-aware feature acquisition at test-time. The method incrementally asks for features based on the available context that are known feature values. The proposed metho… ▽ More In real-world scenarios, different features have different acquisition costs at test-time which necessitates cost-aware methods to optimize the cost and performance trade-off. This paper introduces a novel and scalable approach for cost-aware feature acquisition at test-time. The method incrementally asks for features based on the available context that are known feature values. The proposed method is based on sensitivity analysis in neural networks and density estimation using denoising autoencoders with binary representation layers. In the proposed architecture, a denoising autoencoder is used to handle unknown features (i.e., features that are yet to be acquired), and the sensitivity of predictions with respect to each unknown feature is used as a context-dependent measure of informativeness. We evaluated the proposed method on eight different real-world datasets as well as one synthesized dataset and compared its performance with several other approaches in the literature. According to the results, the suggested method is capable of efficiently acquiring features at test-time in a cost- and context-aware fashion. △ Less

Submitted 3 November, 2018; originally announced November 2018.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2018

arXiv:1805.00794 [pdf, other]

doi 10.1109/ICHI.2018.00092

ECG Heartbeat Classification: A Deep Transferable Representation

Authors: Mohammad Kachuee, Shayan Fazeli, Majid Sarrafzadeh

Abstract: Electrocardiogram (ECG) can be reliably used as a measure to monitor the functionality of the cardiovascular system. Recently, there has been a great attention towards accurate categorization of heartbeats. While there are many commonalities between different ECG conditions, the focus of most studies has been classifying a set of conditions on a dataset annotated for that task rather than learning… ▽ More Electrocardiogram (ECG) can be reliably used as a measure to monitor the functionality of the cardiovascular system. Recently, there has been a great attention towards accurate categorization of heartbeats. While there are many commonalities between different ECG conditions, the focus of most studies has been classifying a set of conditions on a dataset annotated for that task rather than learning and employing a transferable knowledge between different tasks. In this paper, we propose a method based on deep convolutional neural networks for the classification of heartbeats which is able to accurately classify five different arrhythmias in accordance with the AAMI EC57 standard. Furthermore, we suggest a method for transferring the knowledge acquired on this task to the myocardial infarction (MI) classification task. We evaluated the proposed method on PhysionNet's MIT-BIH and PTB Diagnostics datasets. According to the results, the suggested method is able to make predictions with the average accuracies of 93.4% and 95.9% on arrhythmia classification and MI classification, respectively. △ Less

Submitted 12 July, 2018; v1 submitted 19 April, 2018; originally announced May 2018.

arXiv:1707.04364 [pdf, other]

Complex Event Processing of Health Data in Real-time to Predict Heart Failure Risk and Stress

Authors: Sandeep Singh Sandha, Mohammad Kachuee, Sajad Darabi

Abstract: In this paper, we develop a scalable system which can do real-time analytics for different health applications. The occurrence of different health conditions can be regarded as the complex events and thus this concept can be extended to other use cases easily. Large number of users should be able to send the data in real-time, and should be able to receive the feedback and result. Kee** the requ… ▽ More In this paper, we develop a scalable system which can do real-time analytics for different health applications. The occurrence of different health conditions can be regarded as the complex events and thus this concept can be extended to other use cases easily. Large number of users should be able to send the data in real-time, and should be able to receive the feedback and result. Kee** the requirements in mind we used Kafka and Spark to develop our system. In this setting, multiple users are running Kafka producer clients, which are sending data in real-time. Spark streaming is used to process data from Kafka of different window sizes to analyze the health conditions. We have developed and tested the heart attack risk and stress prediction as our sample complex events detection use cases. We have simulated and tested our system with multiple health datasets. △ Less

Submitted 13 July, 2017; originally announced July 2017.

Showing 1–22 of 22 results for author: Kachuee, M