Skip to main content

Showing 1–26 of 26 results for author: Shi, B E

.
  1. arXiv:2401.01572  [pdf, other

    cs.CL cs.SD eess.AS

    Hallucinations in Neural Automatic Speech Recognition: Identifying Errors and Hallucinatory Models

    Authors: Rita Frieske, Bertram E. Shi

    Abstract: Hallucinations are a type of output error produced by deep neural networks. While this has been studied in natural language processing, they have not been researched previously in automatic speech recognition. Here, we define hallucinations in ASR as transcriptions generated by a model that are semantically unrelated to the source utterance, yet still fluent and coherent. The similarity of halluci… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  2. arXiv:2209.14645  [pdf, other

    cs.HC

    Reducing Stress and Anxiety in the Metaverse: A Systematic Review of Meditation, Mindfulness and Virtual Reality

    Authors: Xian Wang, Xiaoyu Mo, Mingming Fan, Lik-Hang Lee, Bertram E. Shi, Pan Hui

    Abstract: Meditation, or mindfulness, is widely used to improve mental health. With the emergence of Virtual Reality technology, many studies have provided evidence that meditation with VR can bring health benefits. However, to our knowledge, there are no guidelines and comprehensive reviews in the literature on how to conduct such research in virtual reality. In order to understand the role of VR technolog… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

  3. arXiv:2201.03804  [pdf, other

    cs.CL cs.AI

    CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

    Authors: Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities. In-car smart assistants should be able to process general as well as car-related commands and perform corresponding actions, which eases driving and improves safety. However, there is a data scarcity issue for low resource lan… ▽ More

    Submitted 14 March, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: 6 pages

  4. arXiv:2201.02419  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

    Authors: Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Automatic speech recognition (ASR) on low resource languages improves the access of linguistic minorities to technological advantages provided by artificial intelligence (AI). In this paper, we address the problem of data scarcity for the Hong Kong Cantonese language by creating a new Cantonese dataset. Our dataset, Multi-Domain Cantonese Corpus (MDCC), consists of 73.6 hours of clean read speech… ▽ More

    Submitted 17 January, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

  5. arXiv:2112.06223  [pdf, other

    cs.CL

    ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

    Authors: Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

    Abstract: Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus buil… ▽ More

    Submitted 3 May, 2022; v1 submitted 12 December, 2021; originally announced December 2021.

    Journal ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022)

  6. arXiv:2108.04228  [pdf, other

    cs.CV cs.LG

    Iterative Distillation for Better Uncertainty Estimates in Multitask Emotion Recognition

    Authors: Didan Deng, Liang Wu, Bertram E. Shi

    Abstract: When recognizing emotions, subtle nuances in displays of emotion generate ambiguity or uncertainty in emotion perception. Emotion uncertainty has been previously interpreted as inter-rater disagreement among multiple annotators. In this paper, we consider a more common and challenging scenario: modeling emotion uncertainty when only single emotion labels are available. From a Bayesian perspective,… ▽ More

    Submitted 17 October, 2021; v1 submitted 21 July, 2021; originally announced August 2021.

    Comments: Accepted as a Workshop paper in ICCV2021 proceeding

  7. Learning Hierarchical Integration of Foveal and Peripheral Vision for Vergence Control by Active Efficient Coding

    Authors: Zhetuo Zhao, Jochen Triesch, Bertram E. Shi

    Abstract: The active efficient coding (AEC) framework parsimoniously explains the joint development of visual processing and eye movements, e.g., the emergence of binocular disparity selective neurons and fusional vergence, the disjunctive eye movements that align left and right eye images. Vergence can be driven by information in both the fovea and periphery, which play complementary roles. The high resolu… ▽ More

    Submitted 29 January, 2021; originally announced March 2021.

  8. arXiv:2101.11391  [pdf, ps, other

    cs.CV cs.AI

    Self-Calibrating Active Binocular Vision via Active Efficient Coding with Deep Autoencoders

    Authors: Charles Wilmot, Bertram E. Shi, Jochen Triesch

    Abstract: We present a model of the self-calibration of active binocular vision comprising the simultaneous learning of visual representations, vergence, and pursuit eye movements. The model follows the principle of Active Efficient Coding (AEC), a recent extension of the classic Efficient Coding Hypothesis to active perception. In contrast to previous AEC models, the present model uses deep autoencoders to… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

  9. arXiv:2101.05682  [pdf, other

    cs.CV cs.RO

    AVGCN: Trajectory Prediction using Graph Convolutional Networks Guided by Human Attention

    Authors: Congcong Liu, Yuying Chen, Ming Liu, Bertram E. Shi

    Abstract: Pedestrian trajectory prediction is a critical yet challenging task, especially for crowded scenes. We suggest that introducing an attention mechanism to infer the importance of different neighbors is critical for accurate trajectory prediction in scenes with varying crowd size. In this work, we propose a novel method, AVGCN, for trajectory prediction utilizing graph convolutional networks (GCN) b… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: 7 pages, 4 figures

  10. arXiv:2009.07140  [pdf, other

    cs.CV

    HGCN-GJS: Hierarchical Graph Convolutional Network with Groupwise Joint Sampling for Trajectory Prediction

    Authors: Yuying Chen, Congcong Liu, Xiaodong Mei, Bertram E. Shi, Ming Liu

    Abstract: Accurate pedestrian trajectory prediction is of great importance for downstream tasks such as autonomous driving and mobile robot navigation. Fully investigating the social interactions within the crowd is crucial for accurate pedestrian trajectory prediction. However, most existing methods do not capture group level interactions well, focusing only on pairwise interactions and neglecting group-wi… ▽ More

    Submitted 15 September, 2023; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: 6 pages, 8 figures, accepted by IROS 2022

  11. arXiv:2002.03557  [pdf, other

    cs.CV cs.MM eess.AS

    Multitask Emotion Recognition with Incomplete Labels

    Authors: Didan Deng, Zhaokang Chen, Bertram E. Shi

    Abstract: We train a unified model to perform three tasks: facial action unit detection, expression classification, and valence-arousal estimation. We address two main challenges of learning the three tasks. First, most existing datasets are highly imbalanced. Second, most existing datasets do not contain labels for all three tasks. To tackle the first challenge, we apply data balancing techniques to experi… ▽ More

    Submitted 10 March, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Accepted by FG2020

  12. Towards High Performance Low Complexity Calibration in Appearance Based Gaze Estimation

    Authors: Zhaokang Chen, Bertram E. Shi

    Abstract: Appearance-based gaze estimation from RGB images provides relatively unconstrained gaze tracking. We have previously proposed a gaze decomposition method that decomposes the gaze angle into the sum of a subject-independent gaze estimate from the image and a subject-dependent bias. This paper extends that work with a more complete characterization of the interplay between the complexity of the cali… ▽ More

    Submitted 13 February, 2022; v1 submitted 25 January, 2020; originally announced January 2020.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  13. arXiv:1909.10400  [pdf, other

    cs.RO cs.AI cs.CV

    Robot Navigation in Crowds by Graph Convolutional Networks with Attention Learned from Human Gaze

    Authors: Yuying Chen, Congcong Liu, Ming Liu, Bertram E. Shi

    Abstract: Safe and efficient crowd navigation for mobile robot is a crucial yet challenging task. Previous work has shown the power of deep reinforcement learning frameworks to train efficient policies. However, their performance deteriorates when the crowd size grows. We suggest that this can be addressed by enabling the network to identify and pay attention to the humans in the crowd that are most critica… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: 8 pages, 7 figures

  14. arXiv:1905.04451  [pdf, other

    cs.CV

    Offset Calibration for Appearance-Based Gaze Estimation via Gaze Decomposition

    Authors: Zhaokang Chen, Bertram E. Shi

    Abstract: Appearance-based gaze estimation provides relatively unconstrained gaze tracking. However, subject-independent models achieve limited accuracy partly due to individual variations. To improve estimation, we propose a novel gaze decomposition method and a single gaze point calibration method, motivated by our finding that the inter-subject squared bias exceeds the intra-subject variance for a subjec… ▽ More

    Submitted 9 January, 2020; v1 submitted 11 May, 2019; originally announced May 2019.

    Comments: Accepted by WACV2020. This is not the camera-ready version

  15. arXiv:1904.08377  [pdf, other

    cs.CV cs.AI cs.RO

    Gaze Training by Modulated Dropout Improves Imitation Learning

    Authors: Yuying Chen, Congcong Liu, Lei Tai, Ming Liu, Bertram E. Shi

    Abstract: Imitation learning by behavioral cloning is a prevalent method that has achieved some success in vision-based autonomous driving. The basic idea behind behavioral cloning is to have the neural network learn from observing a human expert's behavior. Typically, a convolutional neural network learns to predict the steering commands from raw driver-view images by mimicking the behaviors of human drive… ▽ More

    Submitted 16 August, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

    Comments: 6 pages, 4 figures

  16. arXiv:1903.07296  [pdf, other

    cs.CV

    Appearance-Based Gaze Estimation Using Dilated-Convolutions

    Authors: Zhaokang Chen, Bertram E. Shi

    Abstract: Appearance-based gaze estimation has attracted more and more attention because of its wide range of applications. The use of deep convolutional neural networks has improved the accuracy significantly. In order to improve the estimation accuracy further, we focus on extracting better features from eye images. Relatively large changes in gaze angles may result in relatively small changes in eye appe… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

    Comments: 16 pages, 7 figures. To appear in ACCV2018

  17. arXiv:1812.10071  [pdf, other

    cs.CV cs.LG

    Coupled Recurrent Network (CRN)

    Authors: Lin Sun, Kui Jia, Yuejia Shen, Silvio Savarese, Dit Yan Yeung, Bertram E. Shi

    Abstract: Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel strea… ▽ More

    Submitted 25 March, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

  18. arXiv:1805.00625  [pdf, other

    eess.IV cs.CL cs.CV

    Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features

    Authors: Didan Deng, Yuqian Zhou, Jimin Pi, Bertram E. Shi

    Abstract: The integration of information across multiple modalities and across time is a promising way to enhance the emotion recognition performance of affective systems. Much previous work has focused on instantaneous emotion recognition. The 2018 One-Minute Gradual-Emotion Recognition (OMG-Emotion) challenge, which was held in conjunction with the IEEE World Congress on Computational Intelligence, encour… ▽ More

    Submitted 4 May, 2018; v1 submitted 2 May, 2018; originally announced May 2018.

    Comments: 5 pages, 1 figure, subject to the 2018 IJCNN challenge on One-Minute Gradual-Emotion Recognition

  19. arXiv:1708.09126  [pdf

    cs.CV

    Photorealistic Facial Expression Synthesis by the Conditional Difference Adversarial Autoencoder

    Authors: Yuqian Zhou, Bertram Emil Shi

    Abstract: Photorealistic facial expression synthesis from single face image can be widely applied to face recognition, data augmentation for emotion recognition or entertainment. This problem is challenging, in part due to a paucity of labeled facial expression data, making it difficult for algorithms to disambiguate changes due to identity and changes due to expression. In this paper, we propose the condit… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

    Comments: Accepted by ACII2017

  20. arXiv:1708.03958  [pdf, other

    cs.CV

    Lattice Long Short-Term Memory for Human Action Recognition

    Authors: Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese

    Abstract: Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Comments: ICCV2017

  21. Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection

    Authors: Zhaokang Chen, Bertram E. Shi

    Abstract: In order to avoid the "Midas Touch" problem, gaze-based interfaces for selection often introduce a dwell time: a fixed amount of time the user must fixate upon an object before it is selected. Past interfaces have used a uniform dwell time across all objects. Here, we propose a gaze-based browser using a two-step selection policy with variable dwell time. In the first step, a command, e.g. "back"… ▽ More

    Submitted 3 September, 2022; v1 submitted 21 April, 2017; originally announced April 2017.

    Comments: This is an Accepted Manuscript of an article published by Taylor & Francis in the International Journal of Human-Computer Interaction on 30 March, 2018, available online: http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of the final published article, please access: https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/full . Correct Figure 14

  22. arXiv:1610.07129  [pdf, ps, other

    cs.CY

    Develo** and Assessing MATLAB Exercises for Active Concept Learning

    Authors: S. H. Song, Marco Antonelli, Tony Fung, Brandon D. Armstrong, Amy Chong, Albert Lo, Bertram E. Shi

    Abstract: New technologies, such as MOOCs, provide innovative methods to tackle new challenges in teaching and learning, such as globalization and changing contemporary culture and to remove the limits of conventional classrooms. However, they also bring challenges in course delivery and assessment, due to factors such as less direct student-instructor interaction. These challenges are especially severe in… ▽ More

    Submitted 23 October, 2016; originally announced October 2016.

    Comments: Submitted to IEEE Transactions on Education

  23. arXiv:1606.06443  [pdf

    q-bio.NC cs.NE

    An active efficient coding model of the optokinetic nystagmus

    Authors: Chong Zhang, Jochen Triesch, Bertram E. Shi

    Abstract: Optokinetic nystagmus (OKN) is an involuntary eye movement responsible for stabilizing retinal images in the presence of relative motion between an observer and the environment. Fully understanding the development of optokinetic nystagmus requires a neurally plausible computational model that accounts for the neural development and the behavior. To date, work in this area has been limited. We prop… ▽ More

    Submitted 11 October, 2016; v1 submitted 21 June, 2016; originally announced June 2016.

  24. arXiv:1604.04327  [pdf

    cs.CV

    Invariant feature extraction from event based stimuli

    Authors: Thusitha N. Chandrapala, Bertram E. Shi

    Abstract: We propose a novel architecture, the event-based GASSOM for learning and extracting invariant representations from event streams originating from neuromorphic vision sensors. The framework is inspired by feed-forward cortical models for visual processing. The model, which is based on the concepts of sparsity and temporal slowness, is able to learn feature extractors that resemble neurons in the pr… ▽ More

    Submitted 21 June, 2016; v1 submitted 14 April, 2016; originally announced April 2016.

    Comments: 6 pages

  25. arXiv:1510.00562  [pdf, other

    cs.CV

    Human Action Recognition using Factorized Spatio-Temporal Convolutional Networks

    Authors: Lin Sun, Kui Jia, Dit-Yan Yeung, Bertram E. Shi

    Abstract: Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexi… ▽ More

    Submitted 2 October, 2015; originally announced October 2015.

  26. arXiv:1402.3344  [pdf

    cs.CV q-bio.NC

    Intrinsically Motivated Learning of Visual Motion Perception and Smooth Pursuit

    Authors: Chong Zhang, Yu Zhao, Jochen Triesch, Bertram E. Shi

    Abstract: We extend the framework of efficient coding, which has been used to model the development of sensory processing in isolation, to model the development of the perception/action cycle. Our extension combines sparse coding and reinforcement learning so that sensory processing and behavior co-develop to optimize a shared intrinsic motivational signal: the fidelity of the neural encoding of the sensory… ▽ More

    Submitted 24 February, 2014; v1 submitted 13 February, 2014; originally announced February 2014.

    Comments: 6 pages, 5 figures