Search | arXiv e-print repository

Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond

Authors: Gérard Chollet, Hugues Sansen, Yannis Tevissen, Jérôme Boudy, Mossaab Hariz, Christophe Lohr, Fathy Yassa

Abstract: In the age of personal voice assistants, the question of privacy arises. These digital companions often lack memory of past interactions, while relying heavily on the internet for speech processing, raising privacy concerns. Modern smartphones now enable on-device speech processing, making cloud-based solutions unnecessary. Personal assistants for the elderly should excel at memory recall, especia… ▽ More In the age of personal voice assistants, the question of privacy arises. These digital companions often lack memory of past interactions, while relying heavily on the internet for speech processing, raising privacy concerns. Modern smartphones now enable on-device speech processing, making cloud-based solutions unnecessary. Personal assistants for the elderly should excel at memory recall, especially in medical examinations. The e-ViTA project developed a versatile conversational application with local processing and speaker recognition. This paper highlights the importance of speaker diarization enriched with sensor data fusion for contextualized conversation preservation. The use cases applied to the e-VITA project have shown that truly personalized dialogue is pivotal for individual voice assistants. Secure local processing and sensor data fusion ensure virtual companions meet individual user needs without compromising privacy or data security. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 10 pages, 1 figure, to be presented at https://ihiet-ai.org/, Lausanne in April 2024

arXiv:2308.08985 [pdf]

Home monitoring for frailty detection through sound and speaker diarization analysis

Authors: Yannis Tevissen, Dan Istrate, Vincent Zalc, Jérôme Boudy, Gérard Chollet, Frédéric Petitpont, Sami Boutamine

Abstract: As the French, European and worldwide populations are aging, there is a strong interest for new systems that guarantee a reliable and privacy preserving home monitoring for frailty prevention. This work is a part of a global environmental audio analysis system which aims to help identification of Activities of Daily Life (ADL) through human and everyday life sounds recognition, speech presence and… ▽ More As the French, European and worldwide populations are aging, there is a strong interest for new systems that guarantee a reliable and privacy preserving home monitoring for frailty prevention. This work is a part of a global environmental audio analysis system which aims to help identification of Activities of Daily Life (ADL) through human and everyday life sounds recognition, speech presence and number of speakers detection. The focus is made on the number of speakers detection. In this article, we present how recent advances in sound processing and speaker diarization can improve the existing embedded systems. We study the performances of two new methods and discuss the benefits of DNN based approaches which improve performances by about 100%. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: JETSAN, Jun 2023, Aubervilliers & Paris, France

arXiv:2302.09991 [pdf]

Towards Measuring and Scoring Speaker Diarization Fairness

Authors: Yannis Tevissen, Jérôme Boudy, Gérard Chollet, Frédéric Petitpont

Abstract: Speaker diarization, or the task of finding "who spoke and when", is now used in almost every speech processing application. Nevertheless, its fairness has not yet been evaluated because there was no protocol to study its biases one by one. In this paper we propose a protocol and a scoring method designed to evaluate speaker diarization fairness. This protocol is applied on a large dataset of spok… ▽ More Speaker diarization, or the task of finding "who spoke and when", is now used in almost every speech processing application. Nevertheless, its fairness has not yet been evaluated because there was no protocol to study its biases one by one. In this paper we propose a protocol and a scoring method designed to evaluate speaker diarization fairness. This protocol is applied on a large dataset of spoken utterances and report the performances of speaker diarization depending on the gender, the age, the accent of the speaker and the length of the spoken sentence. Some biases induced by the gender, or the accent of the speaker were identified when we applied a state-of-the-art speaker diarization method. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2105.01878 [pdf, other]

doi 10.1145/3316782.3322764

The EMPATHIC Project: Mid-term Achievements

Authors: M. I. Torres, J. M. Olaso, C. Montenegro, R. Santana, A. Vázquez, R. Justo, J. A. Lozano, S. Schlögl, G. Chollet, N. Dugan, M. Irvine, N. Glackin, C. Pickard, A. Esposito, G. Cordasco, A. Troncone, D. Petrovska-Delacretaz, A. Mtibaa, M. A. Hmani, M. S. Korsnes, L. J. Martinussen, S. Escalera, C. Palmero Cantariño, O. Deroo, O. Gordeeva , et al. (4 additional authors not shown)

Abstract: The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interacti… ▽ More The goal of active aging is to promote changes in the elderly community so as to maintain an active, independent and socially-engaged lifestyle. Technological advancements currently provide the necessary tools to foster and monitor such processes. This paper reports on mid-term achievements of the European H2020 EMPATHIC project, which aims to research, innovate, explore and validate new interaction paradigms and platforms for future generations of personalized virtual coaches to assist the elderly and their carers to reach the active aging goal, in the vicinity of their home. The project focuses on evidence-based, user-validated research and integration of intelligent technology, and context sensing methods through automatic voice, eye and facial analysis, integrated with visual and spoken dialogue system capabilities. In this paper, we describe the current status of the system, with a special emphasis on its components and their integration, the creation of a Wizard of Oz platform, and findings gained from user interaction studies conducted throughout the first 18 months of the project. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 12 pages

arXiv:2104.13836 [pdf]

The EMPATHIC Project: Building an Expressive, Advanced Virtual Coach to Improve Independent Healthy-Life-Years of the Elderly

Authors: Luisa Brinkschulte, Natascha Mariacher, Stephan Schlögl, María Inés Torres, Raquel Justo, Javier Mikel Olaso, Anna Esposito, Gennaro Cordasco, Gérard Chollet, Cornelius Glackin, Colin Pickard, Dijana Petrovska-Delacretaz, Mohamed Amine Hmani, Ayment Mtibaa, Anaïs Fernandez, Daria Kyslitska, Begoña Fernandez-Ruanova, Jofre Tenorio-Laranga, Mari Aksnes, Maria Stylianou Korsnes, Miriam Reiner, Fredrik Lindner, Olivier Deroo, Olga Gordeeva

Abstract: This paper outlines the EMPATHIC Research & Innovation project, which aims to research, innovate, explore and validate new interaction paradigms and plat-forms for future generations of Personalized Virtual Coaches to assist elderly people living independently at and around their home. Innovative multimodal face analytics, adaptive spoken dialogue systems, and natural language inter-faces are part… ▽ More This paper outlines the EMPATHIC Research & Innovation project, which aims to research, innovate, explore and validate new interaction paradigms and plat-forms for future generations of Personalized Virtual Coaches to assist elderly people living independently at and around their home. Innovative multimodal face analytics, adaptive spoken dialogue systems, and natural language inter-faces are part of what the project investigates and innovates, aiming to help dependent aging persons and their carers. It will uses remote, non-intrusive technologies to extract physiological markers of emotional states and adapt respective coach responses. In doing so, it aims to develop causal models for emotionally believable coach-user interactions, which shall engage elders and thus keep off loneliness, sustain health, enhance quality of life, and simplify access to future telecare services. Through measurable end-user validations performed in Spain, Norway and France (and complementary user evaluations in Italy), the proposed methods and solutions will have to demonstrate useful-ness, reliability, flexibility and robustness. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: 16 pages

arXiv:1908.11039 [pdf, other]

doi 10.1007/978-3-642-41914-0\_55

3D Face Pose and Animation Tracking via Eigen-Decomposition based Bayesian Approach

Authors: Ngoc-Trung Tran, Fakhr-Eddine Ababsa, Maurice Charbit, Jacques Feldmar, Dijana Petrovska-Delacrétaz, Gérard Chollet

Abstract: This paper presents a new method to track both the face pose and the face animation with a monocular camera. The approach is based on the 3D face model CANDIDE and on the SIFT (Scale Invariant Feature Transform) descriptors, extracted around a few given landmarks (26 selected vertices of CANDIDE model) with a Bayesian approach. The training phase is performed on a synthetic database generated from… ▽ More This paper presents a new method to track both the face pose and the face animation with a monocular camera. The approach is based on the 3D face model CANDIDE and on the SIFT (Scale Invariant Feature Transform) descriptors, extracted around a few given landmarks (26 selected vertices of CANDIDE model) with a Bayesian approach. The training phase is performed on a synthetic database generated from the first video frame. At each current frame, the face pose and animation parameters are estimated via a Bayesian approach, with a Gaussian prior and a Gaussian likelihood function whose the mean and the covariance matrix eigenvalues are updated from the previous frame using eigen decomposition. Numerical results on pose estimation and landmark locations are reported using the Boston University Face Tracking (BUFT) database and Talking Face video. They show that our approach, compared to six other published algorithms, provides a very good compromise and presents a promising perspective due to the good results in terms of landmark localization. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Journal ref: Advances in Visual Computing - 9th International Symposium, ISVC 2013, Rethymnon, Crete, Greece, July 29-31, 2013. Proceedings, Part I, pages 562--571

arXiv:1611.00514 [pdf, other]

The Intelligent Voice 2016 Speaker Recognition System

Authors: Abbas Khosravani, Cornelius Glackin, Nazim Dugan, Gérard Chollet, Nigel Cannings

Abstract: This paper presents the Intelligent Voice (IV) system submitted to the NIST 2016 Speaker Recognition Evaluation (SRE). The primary emphasis of SRE this year was on develo** speaker recognition technology which is robust for novel languages that are much more heterogeneous than those used in the current state-of-the-art, using significantly less training data, that does not contain meta-data from… ▽ More This paper presents the Intelligent Voice (IV) system submitted to the NIST 2016 Speaker Recognition Evaluation (SRE). The primary emphasis of SRE this year was on develo** speaker recognition technology which is robust for novel languages that are much more heterogeneous than those used in the current state-of-the-art, using significantly less training data, that does not contain meta-data from those languages. The system is based on the state-of-the-art i-vector/PLDA which is developed on the fixed training condition, and the results are reported on the protocol defined on the development set of the challenge. △ Less

Submitted 2 November, 2016; originally announced November 2016.

Comments: 7 pages, 3 figures, NIST SRE 2016 Workshop

Showing 1–7 of 7 results for author: Chollet, G