Search | arXiv e-print repository

Multi-Agent Reachability Calibration with Conformal Prediction

Authors: Anish Muthali, Haotian Shen, Sampada Deglurkar, Michael H. Lim, Rebecca Roelofs, Aleksandra Faust, Claire Tomlin

Abstract: We investigate methods to provide safety assurances for autonomous agents that incorporate predictions of other, uncontrolled agents' behavior into their own trajectory planning. Given a learning-based forecasting model that predicts agents' trajectories, we introduce a method for providing probabilistic assurances on the model's prediction error with calibrated confidence intervals. Through quant… ▽ More We investigate methods to provide safety assurances for autonomous agents that incorporate predictions of other, uncontrolled agents' behavior into their own trajectory planning. Given a learning-based forecasting model that predicts agents' trajectories, we introduce a method for providing probabilistic assurances on the model's prediction error with calibrated confidence intervals. Through quantile regression, conformal prediction, and reachability analysis, our method generates probabilistically safe and dynamically feasible prediction sets. We showcase their utility in certifying the safety of planning algorithms, both in simulations using actual autonomous driving data and in an experiment with Boeing vehicles. △ Less

Submitted 13 December, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.09818 [pdf, other]

A real-time blind quality-of-experience assessment metric for HTTP adaptive streaming

Authors: Chunyi Li, May Lim, Abdelhak Bentaleb, Roger Zimmermann

Abstract: In today's Internet, HTTP Adaptive Streaming (HAS) is the mainstream standard for video streaming, which switches the bitrate of the video content based on an Adaptive BitRate (ABR) algorithm. An effective Quality of Experience (QoE) assessment metric can provide crucial feedback to an ABR algorithm. However, predicting such real-time QoE on the client side is challenging. The QoE prediction requi… ▽ More In today's Internet, HTTP Adaptive Streaming (HAS) is the mainstream standard for video streaming, which switches the bitrate of the video content based on an Adaptive BitRate (ABR) algorithm. An effective Quality of Experience (QoE) assessment metric can provide crucial feedback to an ABR algorithm. However, predicting such real-time QoE on the client side is challenging. The QoE prediction requires high consistency with the Human Visual System (HVS), low latency, and blind assessment, which are difficult to realize together. To address this challenge, we analyzed various characteristics of HAS systems and propose a non-uniform sampling metric to reduce time complexity. Furthermore, we design an effective QoE metric that integrates resolution and rebuffering time as the Quality of Service (QoS), as well as spatiotemporal output from a deep neural network and specific switching events as content information. These reward and penalty features are regressed into quality scores with a Support Vector Regression (SVR) model. Experimental results show that the accuracy of our metric outperforms the mainstream blind QoE metrics by 0.3, and its computing time is only 60\% of the video playback, indicating that the proposed metric is capable of providing real-time guidance to ABR algorithms and improving the overall performance of HAS. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 6 pages,4 figures

arXiv:2302.05582 [pdf, other]

ASDF: A Differential Testing Framework for Automatic Speech Recognition Systems

Authors: Daniel Hao Xian Yuen, Andrew Yong Chen Pang, Zhou Yang, Chun Yong Chong, Mei Kuan Lim, David Lo

Abstract: Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test ca… ▽ More Recent years have witnessed wider adoption of Automated Speech Recognition (ASR) techniques in various domains. Consequently, evaluating and enhancing the quality of ASR systems is of great importance. This paper proposes ASDF, an Automated Speech Recognition Differential Testing Framework for testing ASR systems. ASDF extends an existing ASR testing tool, the CrossASR++, which synthesizes test cases from a text corpus. However, CrossASR++ fails to make use of the text corpus efficiently and provides limited information on how the failed test cases can improve ASR systems. To address these limitations, our tool incorporates two novel features: (1) a text transformation module to boost the number of generated test cases and uncover more errors in ASR systems and (2) a phonetic analysis module to identify on which phonemes the ASR system tend to produce errors. ASDF generates more high-quality test cases by applying various text transformation methods (e.g., change tense) to the texts in failed test cases. By doing so, ASDF can utilize a small text corpus to generate a large number of audio test cases, something which CrossASR++ is not capable of. In addition, ASDF implements more metrics to evaluate the performance of ASR systems from multiple perspectives. ASDF performs phonetic analysis on the identified failed test cases to identify the phonemes that ASR systems tend to transcribe incorrectly, providing useful information for developers to improve ASR systems. The demonstration video of our tool is made online at https://www.youtube.com/watch?v=DzVwfc3h9As. The implementation is available at https://github.com/danielyuenhx/asdf-differential-testing. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: Accpeted by ICST 2023 Tool Demo Track

arXiv:2212.13032 [pdf]

Diagnosis of COVID-19 based on Chest Radiography

Authors: Mei Gah Lim, Hoi Leong Lee

Abstract: The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing… ▽ More The Coronavirus disease 2019 (COVID-19) was first identified in Wuhan, China, in early December 2019 and now becoming a pandemic. When COVID-19 patients undergo radiography examination, radiologists can observe the present of radiographic abnormalities from their chest X-ray (CXR) images. In this study, a deep convolutional neural network (CNN) model was proposed to aid radiologists in diagnosing COVID-19 patients. First, this work conducted a comparative study on the performance of modified VGG-16, ResNet-50 and DenseNet-121 to classify CXR images into normal, COVID-19 and viral pneumonia. Then, the impact of image augmentation on the classification results was evaluated. The publicly available COVID-19 Radiography Database was used throughout this study. After comparison, ResNet-50 achieved the highest accuracy with 95.88%. Next, after training ResNet-50 with rotation, translation, horizontal flip, intensity shift and zoom augmented dataset, the accuracy dropped to 80.95%. Furthermore, an ablation study on the effect of image augmentation on the classification results found that the combinations of rotation and intensity shift augmentation methods obtained an accuracy higher than baseline, which is 96.14%. Finally, ResNet-50 with rotation and intensity shift augmentations performed the best and was proposed as the final classification model in this work. These findings demonstrated that the proposed classification model can provide a promising result for COVID-19 diagnosis. △ Less

Submitted 26 December, 2022; originally announced December 2022.

arXiv:2211.15327 [pdf, other]

Task-Aware Asynchronous Multi-Task Model with Class Incremental Contrastive Learning for Surgical Scene Understanding

Authors: Lalithkumar Seenivasan, Mobarakol Islam, Mengya Xu, Chwee Ming Lim, Hongliang Ren

Abstract: Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Mo… ▽ More Purpose: Surgery scene understanding with tool-tissue interaction recognition and automatic report generation can play an important role in intra-operative guidance, decision-making and postoperative analysis in robotic surgery. However, domain shifts between different surgeries with inter and intra-patient variation and novel instruments' appearance degrade the performance of model prediction. Moreover, it requires output from multiple models, which can be computationally expensive and affect real-time performance. Methodology: A multi-task learning (MTL) model is proposed for surgical report generation and tool-tissue interaction prediction that deals with domain shift problems. The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction. The shared feature extractor employs class incremental contrastive learning (CICL) to tackle intensity shift and novel class appearance in the target domain. We design Laplacian of Gaussian (LoG) based curriculum learning into both shared and task-specific branches to enhance model learning. We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally. Results: The proposed MTL model trained using task-aware optimization and fine-tuning techniques reported a balanced performance (BLEU score of 0.4049 for scene captioning and accuracy of 0.3508 for interaction detection) for both tasks on the target domain and performed on-par with single-task models in domain adaptation. Conclusion: The proposed multi-task model was able to adapt to domain shifts, incorporate novel instruments in the target domain, and perform tool-tissue interaction detection and report generation on par with single-task models. △ Less

Submitted 28 November, 2022; originally announced November 2022.

Comments: Manuscript accepted in the International Journal of Computer Assisted Radiology and Surgery. codes available: https://github.com/lalithjets/Domain-adaptation-in-MTL

arXiv:2210.05015 [pdf, other]

doi 10.1613/jair.1.14525

Optimality Guarantees for Particle Belief Approximation of POMDPs

Authors: Michael H. Lim, Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood w… ▽ More Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of $\mathcal{O}(C)$, where $C$ is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers. △ Less

Submitted 19 October, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Journal ref: Journal of Artificial Intelligence Research, 77, 1591-1636 (2023)

arXiv:2204.07629 [pdf]

Navigation between initial and desired community states using shortcuts

Authors: Benjamin W. Blonder, Michael H. Lim, Zachary Sunberg, Claire Tomlin

Abstract: Ecological management problems often involve navigating from an initial to a desired community state. We ask whether navigation is possible without brute-force additions and deletions of species, using actions of varying costs: adding/deleting a small number of individuals of a species, changing the environment, and waiting. Navigation can yield direct paths (single sequence of actions) or shortcu… ▽ More Ecological management problems often involve navigating from an initial to a desired community state. We ask whether navigation is possible without brute-force additions and deletions of species, using actions of varying costs: adding/deleting a small number of individuals of a species, changing the environment, and waiting. Navigation can yield direct paths (single sequence of actions) or shortcut paths (multiple sequences of actions with lower cost than a direct path). We ask (1) when is non-brute-force navigation possible?; (2) do shortcuts exist and what are their properties?; and (3) what heuristics predict shortcut existence? Using several empirical datasets, we show that (1) non-brute-force navigation is only possible between some state pairs, (2) shortcuts exist between many state pairs; and (3) changes in abundance and richness are the strongest predictors of shortcut existence, independent of dataset and algorithm choices. State diagrams thus unveil hidden strategies for efficiently shifting between states. △ Less

Submitted 2 December, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

arXiv:2112.09456 [pdf, other]

Compositional Learning-based Planning for Vision POMDPs

Authors: Sampada Deglurkar, Michael H. Lim, Johnathan Tucker, Zachary N. Sunberg, Aleksandra Faust, Claire J. Tomlin

Abstract: The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In thi… ▽ More The Partially Observable Markov Decision Process (POMDP) is a powerful framework for capturing decision-making problems that involve state and transition uncertainty. However, most current POMDP planners cannot effectively handle high-dimensional image observations prevalent in real world applications, and often require lengthy online training that requires interaction with the environment. In this work, we propose Visual Tree Search (VTS), a compositional learning and planning procedure that combines generative models learned offline with online model-based POMDP planning. The deep generative observation models evaluate the likelihood of and predict future image observations in a Monte Carlo tree search planner. We show that VTS is robust to different types of image noises that were not present during training and can adapt to different reward structures without the need to re-train. This new approach significantly and stably outperforms several baseline state-of-the-art vision POMDP algorithms while using a fraction of the training time. △ Less

Submitted 2 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

arXiv:2112.08189 [pdf, other]

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery

Authors: Mobarakol Islam, Vibashan VS, Chwee Ming Lim, Hongliang Ren

Abstract: Representation learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery. Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more on dealing with surgical instruments. The objective is to reduce the operation time and facilitate the surgery for both surgeons and patients. We propose an end-t… ▽ More Representation learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery. Incorporating cognitive ability to automate the camera control enables the surgeon to concentrate more on dealing with surgical instruments. The objective is to reduce the operation time and facilitate the surgery for both surgeons and patients. We propose an end-to-end trainable Spatio-Temporal Multi-Task Learning (ST-MTL) model with a shared encoder and spatio-temporal decoders for the real-time surgical instrument segmentation and task-oriented saliency detection. In the MTL model of shared parameters, optimizing multiple loss functions into a convergence point is still an open challenge. We tackle the problem with a novel asynchronous spatio-temporal optimization (ASTO) technique by calculating independent gradients for each decoder. We also design a competitive squeeze and excitation unit by casting a skip connection that retains weak features, excites strong features, and performs dynamic spatial and channel-wise feature recalibration. To capture better long term spatio-temporal dependencies, we enhance the long-short term memory (LSTM) module by concatenating high-level encoder features of consecutive frames. We also introduce Sinkhorn regularized loss to enhance task-oriented saliency detection by preserving computational efficiency. We generate the task-aware saliency maps and scanpath of the instruments on the dataset of the MICCAI 2017 robotic instrument segmentation challenge. Compared to the state-of-the-art segmentation and saliency methods, our model outperforms most of the evaluation metrics and produces an outstanding performance in the challenge. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: 12 pages

arXiv:2109.07578 [pdf, other]

Multi-Task Learning with Sequence-Conditioned Transporter Networks

Authors: Michael H. Lim, Andy Zeng, Brian Ichter, Maryam Bandari, Erwin Coumans, Claire Tomlin, Stefan Schaal, Aleksandra Faust

Abstract: Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of be… ▽ More Enabling robots to solve multiple manipulation tasks has a wide range of industrial applications. While learning-based approaches enjoy flexibility and generalizability, scaling these approaches to solve such compositional tasks remains a challenge. In this work, we aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling. First, we propose a new suite of benchmark specifically aimed at compositional tasks, MultiRavens, which allows defining custom task combinations through task modules that are inspired by industrial tasks and exemplify the difficulties in vision-based learning and planning methods. Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling and can efficiently learn to solve multi-task long horizon problems. Our analysis suggests that not only the new framework significantly improves pick-and-place performance on novel 10 multi-task benchmark problems, but also the multi-task learning with weighted sampling can vastly improve learning and agent performances on individual tasks. △ Less

Submitted 15 September, 2021; originally announced September 2021.

arXiv:2107.11091 [pdf, other]

Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation

Authors: Mengya Xu, Mobarakol Islam, Chwee Ming Lim, Hongliang Ren

Abstract: Generating surgical reports aimed at surgical scene understanding in robot-assisted surgery can contribute to documenting entry tasks and post-operative analysis. Despite the impressive outcome, the deep learning model degrades the performance when applied to different domains encountering domain shifts. In addition, there are new instruments and variations in surgical tissues appeared in robotic… ▽ More Generating surgical reports aimed at surgical scene understanding in robot-assisted surgery can contribute to documenting entry tasks and post-operative analysis. Despite the impressive outcome, the deep learning model degrades the performance when applied to different domains encountering domain shifts. In addition, there are new instruments and variations in surgical tissues appeared in robotic surgery. In this work, we propose class-incremental domain adaptation (CIDA) with a multi-layer transformer-based model to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery. To adapt incremental classes and extract domain invariant features, a class-incremental (CI) learning method with supervised contrastive (SupCon) loss is incorporated with a feature extractor. To generate caption from the extracted feature, curriculum by one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model. CBS smoothes the features embedding using anti-aliasing and helps the model to learn domain invariant features. We also adopt label smoothing (LS) to calibrate prediction probability and obtain better feature representation with both feature extractor and captioning model. The proposed techniques are empirically evaluated by using the datasets of two surgical domains, such as nephrectomy operations and transoral robotic surgery. We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain under domain shift and unseen classes in the manners of one-shot and few-shot learning. The code is publicly available at https://github.com/XuMengyaAmy/CIDACaptioning. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: Accepted in MICCAI 2021

arXiv:2101.10869 [pdf]

A Raspberry Pi-based Traumatic Brain Injury Detection System for Single-Channel Electroencephalogram

Authors: Navjodh Singh Dhillon, Agustinus Sutandi, Manoj Vishwanath, Miranda M. Lim, Hung Cao, Dong Si

Abstract: Traumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and… ▽ More Traumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and portable systems for early detection of TBI. This work describes a Raspberry Pi based portable, real-time data acquisition, and automated processing system that uses machine learning to efficiently identify TBI and automatically score sleep stages from a single-channel Electroen-cephalogram (EEG) signal. We discuss the design, implementation, and verification of the system that can digitize EEG signal using an Analog to Digital Converter (ADC) and perform real-time signal classification to detect the presence of mild TBI (mTBI). We utilize Convolutional Neural Networks (CNN) and XGBoost based predictive models to evaluate the performance and demonstrate the versatility of the system to operate with multiple types of predictive models. We achieve a peak classification accuracy of more than 90% with a classification time of less than 1 s across 16 s - 64 s epochs for TBI vs control conditions. This work can enable development of systems suitable for field use without requiring specialized medical equipment for early TBI detection applications and TBI research. Further, this work opens avenues to implement connected, real-time TBI related health and wellness monitoring systems. △ Less

Submitted 29 January, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: 12 pages, 6 figures

arXiv:2012.10140 [pdf, other]

Voronoi Progressive Widening: Efficient Online Solvers for Continuous State, Action, and Observation POMDPs

Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

Abstract: This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algori… ▽ More This paper introduces Voronoi Progressive Widening (VPW), a generalization of Voronoi optimistic optimization (VOO) and action progressive widening to partially observable Markov decision processes (POMDPs). Tree search algorithms can use VPW to effectively handle continuous or hybrid action spaces by efficiently balancing local and global action searching. This paper proposes two VPW-based algorithms and analyzes them from theoretical and simulation perspectives. Voronoi Optimistic Weighted Sparse Sampling (VOWSS) is a theoretical tool that justifies VPW-based online solvers, and it is the first algorithm with global convergence guarantees for continuous state, action, and observation POMDPs. Voronoi Optimistic Monte Carlo Planning with Observation Weighting (VOMCPOW) is a versatile and efficient algorithm that consistently outperforms state-of-the-art POMDP algorithms in several simulation experiments. △ Less

Submitted 1 April, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

arXiv:2008.02691 [pdf, other]

Adaptive Coordination Offsets for Signalized Arterial Intersections using Deep Reinforcement Learning

Authors: Keith Anshilo Diaz, Damian Dailisan, Umang Sharaf, Carissa Santos, Qijian Gan, Francis Aldrine Uy, May T. Lim, Alexandre M. Bayen

Abstract: Coordinating intersections in arterial networks is critical to the performance of urban transportation systems. Deep reinforcement learning (RL) has gained traction in traffic control research along with data-driven approaches for traffic control systems. To date, proposed deep RL-based traffic schemes control phase activation or duration. Yet, such approaches may bypass low volume links for sever… ▽ More Coordinating intersections in arterial networks is critical to the performance of urban transportation systems. Deep reinforcement learning (RL) has gained traction in traffic control research along with data-driven approaches for traffic control systems. To date, proposed deep RL-based traffic schemes control phase activation or duration. Yet, such approaches may bypass low volume links for several cycles in order to optimize the network-level traffic flow. Here, we propose a deep RL framework that dynamically adjusts offsets based on traffic states and preserves the planned phase timings and order derived from model-based methods. This framework allows us to improve arterial coordination while maintaining phase order and timing predictability. Using a validated and calibrated traffic model, we trained the policy of a deep RL agent that aims to reduce travel delays in the network. We evaluated the resulting policy by comparing its performance against the phase offsets deployed along a segment of Huntington Drive in the city of Arcadia. The resulting policy dynamically readjusts phase offsets in response to changes in traffic demand. Simulation results show that the proposed deep RL agent outperformed the baseline on average, effectively reducing delay time by 13.21% in the AM Scenario, 2.42% in the Noon scenario, and 6.2% in the PM scenario when offsets are adjusted in 15-minute intervals. Finally, we also show the robustness of our agent to extreme traffic conditions, such as demand surges in off-peak hours and localized traffic incidents △ Less

Submitted 29 August, 2022; v1 submitted 6 August, 2020; originally announced August 2020.

arXiv:1910.13122 [pdf]

doi 10.3390/su11205791

Algorithmic decision-making in AVs: Understanding ethical and technical concerns for smart cities

Authors: Hazel Si Min Lim, Araz Taeihagh

Abstract: Autonomous Vehicles (AVs) are increasingly embraced around the world to advance smart mobility and more broadly, smart, and sustainable cities. Algorithms form the basis of decision-making in AVs, allowing them to perform driving tasks autonomously, efficiently, and more safely than human drivers and offering various economic, social, and environmental benefits. However, algorithmic decision-makin… ▽ More Autonomous Vehicles (AVs) are increasingly embraced around the world to advance smart mobility and more broadly, smart, and sustainable cities. Algorithms form the basis of decision-making in AVs, allowing them to perform driving tasks autonomously, efficiently, and more safely than human drivers and offering various economic, social, and environmental benefits. However, algorithmic decision-making in AVs can also introduce new issues that create new safety risks and perpetuate discrimination. We identify bias, ethics, and perverse incentives as key ethical issues in the AV algorithms' decision-making that can create new safety risks and discriminatory outcomes. Technical issues in the AVs' perception, decision-making and control algorithms, limitations of existing AV testing and verification methods, and cybersecurity vulnerabilities can also undermine the performance of the AV system. This article investigates the ethical and technical concerns surrounding algorithmic decision-making in AVs by exploring how driving decisions can perpetuate discrimination and create new safety risks for the public. We discuss steps taken to address these issues, highlight the existing research gaps and the need to mitigate these issues through the design of AV's algorithms and of policies and regulations to fully realise AVs' benefits for smart and sustainable cities. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Journal ref: Sustainability, 2019, 11(20), 5791

arXiv:1910.04332 [pdf, other]

doi 10.24963/ijcai.2020/572

Sparse tree search optimality guarantees in POMDPs with continuous observation spaces

Authors: Michael H. Lim, Claire J. Tomlin, Zachary N. Sunberg

Abstract: Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However… ▽ More Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power. △ Less

Submitted 5 June, 2023; v1 submitted 9 October, 2019; originally announced October 2019.

arXiv:1906.11018 [pdf]

Integration of TensorFlow based Acoustic Model with Kaldi WFST Decoder

Authors: Minkyu Lim, Ji-Hwan Kim

Abstract: While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network… ▽ More While the Kaldi framework provides state-of-the-art components for speech recognition like feature extraction, deep neural network (DNN)-based acoustic models, and a weighted finite state transducer (WFST)-based decoder, it is difficult to implement a new flexible DNN model. By contrast, a general-purpose deep learning framework, such as TensorFlow, can easily build various types of neural network architectures using a tensor-based computation method, but it is difficult to apply them to WFST-based speech recognition. In this study, a TensorFlow-based acoustic model is integrated with a WFST-based Kaldi decoder to combine the two frameworks. The features and alignments used in Kaldi are converted so they can be trained by the TensorFlow model, and the DNN-based acoustic model is then trained. In the integrated Kaldi decoder, the posterior probabilities are calculated by querying the trained TensorFlow model, and a beam search is performed to generate the lattice. The advantages of the proposed one-pass decoder include the application of various types of neural networks to WFST-based speech recognition and WFST-based online decoding using a TensorFlow-based acoustic model. The TensorFlow based acoustic models trained using the RM, WSJ, and LibriSpeech datasets show the same level of performance as the model trained using the Kaldi framework. △ Less

Submitted 21 June, 2019; originally announced June 2019.

arXiv:1807.05855 [pdf]

A Fast-Converged Acoustic Modeling for Korean Speech Recognition: A Preliminary Study on Time Delay Neural Network

Authors: Hosung Park, Donghyun Lee, Minkyu Lim, Yoseb Kang, Juneseok Oh, Ji-Hwan Kim

Abstract: In this paper, a time delay neural network (TDNN) based acoustic model is proposed to implement a fast-converged acoustic modeling for Korean speech recognition. The TDNN has an advantage in fast-convergence where the amount of training data is limited, due to subsampling which excludes duplicated weights. The TDNN showed an absolute improvement of 2.12% in terms of character error rate compared t… ▽ More In this paper, a time delay neural network (TDNN) based acoustic model is proposed to implement a fast-converged acoustic modeling for Korean speech recognition. The TDNN has an advantage in fast-convergence where the amount of training data is limited, due to subsampling which excludes duplicated weights. The TDNN showed an absolute improvement of 2.12% in terms of character error rate compared to feed forward neural network (FFNN) based modelling for Korean speech corpora. The proposed model converged 1.67 times faster than a FFNN-based model did. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: 6 pages, 2 figures

Showing 1–18 of 18 results for author: Lim, M