-
Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss
Authors:
Artem Khrapov,
Vadim Popov,
Tasnima Sadekova,
Assel Yermekova,
Mikhail Kudinov
Abstract:
Diffusion models are known to be vulnerable to outliers in training data. In this paper we study an alternative diffusion loss function, which can preserve the high quality of generated data like the original squared $L_{2}$ loss while at the same time being robust to outliers. We propose to use pseudo-Huber loss function with a time-dependent parameter to allow for the trade-off between robustnes…
▽ More
Diffusion models are known to be vulnerable to outliers in training data. In this paper we study an alternative diffusion loss function, which can preserve the high quality of generated data like the original squared $L_{2}$ loss while at the same time being robust to outliers. We propose to use pseudo-Huber loss function with a time-dependent parameter to allow for the trade-off between robustness on the most vulnerable early reverse-diffusion steps and fine details restoration on the final steps. We show that pseudo-Huber loss with the time-dependent parameter exhibits better performance on corrupted datasets in both image and audio domains. In addition, the loss function we propose can potentially help diffusion models to resist dataset corruption while not requiring data filtering or purification compared to conventional training algorithms.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Looking Together $\neq$ Seeing the Same Thing: Understanding Surgeons' Visual Needs During Intra-operative Coordination and Instruction
Authors:
Vitaliy Popov,
Xinyue Chen,
**gying Wang,
Michael Kemp,
Gurjit Sandhu,
Taylor Kantor,
Natalie Mateju,
Xu Wang
Abstract:
Shared gaze visualizations have been found to enhance collaboration and communication outcomes in diverse HCI scenarios including computer supported collaborative work and learning contexts. Given the importance of gaze in surgery operations, especially when a surgeon trainer and trainee need to coordinate their actions, research on the use of gaze to facilitate intra-operative coordination and in…
▽ More
Shared gaze visualizations have been found to enhance collaboration and communication outcomes in diverse HCI scenarios including computer supported collaborative work and learning contexts. Given the importance of gaze in surgery operations, especially when a surgeon trainer and trainee need to coordinate their actions, research on the use of gaze to facilitate intra-operative coordination and instruction has been limited and shows mixed implications. We performed a field observation of 8 surgeries and an interview study with 14 surgeons to understand their visual needs during operations, informing ways to leverage and augment gaze to enhance intra-operative coordination and instruction. We found that trainees have varying needs in receiving visual guidance which are often unfulfilled by the trainers' instructions. It is critical for surgeons to control the timing of the gaze-based visualizations and effectively interpret gaze data. We suggest overlay technologies, e.g., gaze-based summaries and depth sensing, to augment raw gaze in support of surgical coordination and instruction.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning
Authors:
**gying Wang,
Haoran Tang,
Taylor Kantor,
Tandis Soltani,
Vitaliy Popov,
Xu Wang
Abstract:
Videos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert surgeons create exercises with feedback based on surgery recordings. Surgment is powered by a few-shot-learning-based pipeline (SegGPT+SAM) to segmen…
▽ More
Videos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert surgeons create exercises with feedback based on surgery recordings. Surgment is powered by a few-shot-learning-based pipeline (SegGPT+SAM) to segment surgery scenes, achieving an accuracy of 92\%. The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons from a formative study. Surgment enables surgeons to 1) retrieve frames of interest through sketches, and 2) design exercises that target specific anatomical components and offer visual feedback. In an evaluation study with 11 surgeons, participants applauded the search-by-sketch approach for identifying frames of interest and found the resulting image-based questions and feedback to be of high educational value.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
How should the advent of large language models affect the practice of science?
Authors:
Marcel Binz,
Stephan Alaniz,
Adina Roskies,
Balazs Aczel,
Carl T. Bergstrom,
Colin Allen,
Daniel Schad,
Dirk Wulff,
Jevin D. West,
Qiong Zhang,
Richard M. Shiffrin,
Samuel J. Gershman,
Ven Popov,
Emily M. Bender,
Marco Marelli,
Matthew M. Botvinick,
Zeynep Akata,
Eric Schulz
Abstract:
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu…
▽ More
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme
Authors:
Vadim Popov,
Ivan Vovk,
Vladimir Gogoryan,
Tasnima Sadekova,
Mikhail Kudinov,
Jiansheng Wei
Abstract:
Voice conversion is a common speech synthesis task which can be solved in different ways depending on a particular real-world scenario. The most challenging one often referred to as one-shot many-to-many voice conversion consists in copying the target voice from only one reference utterance in the most general case when both source and target speakers do not belong to the training dataset. We pres…
▽ More
Voice conversion is a common speech synthesis task which can be solved in different ways depending on a particular real-world scenario. The most challenging one often referred to as one-shot many-to-many voice conversion consists in copying the target voice from only one reference utterance in the most general case when both source and target speakers do not belong to the training dataset. We present a scalable high-quality solution based on diffusion probabilistic modeling and demonstrate its superior quality compared to state-of-the-art one-shot voice conversion approaches. Moreover, focusing on real-time applications, we investigate general principles which can make diffusion models faster while kee** synthesis quality at a high level. As a result, we develop a novel Stochastic Differential Equations solver suitable for various diffusion model types and generative tasks as shown through empirical studies and justify it by theoretical analysis.
△ Less
Submitted 4 August, 2022; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Authors:
Vadim Popov,
Ivan Vovk,
Vladimir Gogoryan,
Tasnima Sadekova,
Mikhail Kudinov
Abstract:
Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by graduall…
▽ More
Recently, denoising diffusion probabilistic models and generative score matching have shown high potential in modelling complex data distributions while stochastic calculus has provided a unified point of view on these techniques allowing for flexible inference schemes. In this paper we introduce Grad-TTS, a novel text-to-speech model with score-based decoder producing mel-spectrograms by gradually transforming noise predicted by encoder and aligned with text input by means of Monotonic Alignment Search. The framework of stochastic differential equations helps us to generalize conventional diffusion probabilistic models to the case of reconstructing data from noise with different parameters and allows to make this reconstruction flexible by explicitly controlling trade-off between sound quality and inference speed. Subjective human evaluation shows that Grad-TTS is competitive with state-of-the-art text-to-speech approaches in terms of Mean Opinion Score. We will make the code publicly available shortly.
△ Less
Submitted 5 August, 2021; v1 submitted 13 May, 2021;
originally announced May 2021.
-
Fine-tuning of Language Models with Discriminator
Authors:
Vadim Popov,
Mikhail Kudinov
Abstract:
Cross-entropy loss is a common choice when it comes to multiclass classification tasks and language modeling in particular. Minimizing this loss results in language models of very good quality. We show that it is possible to fine-tune these models and make them perform even better if they are fine-tuned with sum of cross-entropy loss and reverse Kullback-Leibler divergence. The latter is estimated…
▽ More
Cross-entropy loss is a common choice when it comes to multiclass classification tasks and language modeling in particular. Minimizing this loss results in language models of very good quality. We show that it is possible to fine-tune these models and make them perform even better if they are fine-tuned with sum of cross-entropy loss and reverse Kullback-Leibler divergence. The latter is estimated using discriminator network that we train in advance. During fine-tuning probabilities of rare words that are usually underestimated by language models become bigger. The novel approach that we propose allows us to reach state-of-the-art quality on Penn Treebank: perplexity decreases from 52.4 to 52.1. Our fine-tuning algorithm is rather fast, scales well to different architectures and datasets and requires almost no hyperparameter tuning: the only hyperparameter that needs to be tuned is learning rate.
△ Less
Submitted 15 January, 2019; v1 submitted 12 November, 2018;
originally announced November 2018.
-
Differentially Private Distributed Learning for Language Modeling Tasks
Authors:
Vadim Popov,
Mikhail Kudinov,
Irina Piontkovskaya,
Petr Vytovtov,
Alex Nevidomsky
Abstract:
One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and be completely different from what we observe in publicly available data. At the same time, public data can be used for obtaining general knowledge (i.e. general…
▽ More
One of the big challenges in machine learning applications is that training data can be different from the real-world data faced by the algorithm. In language modeling, users' language (e.g. in private messaging) could change in a year and be completely different from what we observe in publicly available data. At the same time, public data can be used for obtaining general knowledge (i.e. general model of English). We study approaches to distributed fine-tuning of a general model on user private data with the additional requirements of maintaining the quality on the general data and minimization of communication costs. We propose a novel technique that significantly improves prediction quality on users' language compared to a general model and outperforms gradient compression methods in terms of communication efficiency. The proposed procedure is fast and leads to an almost 70% perplexity reduction and 8.7 percentage point improvement in keystroke saving rate on informal English texts. We also show that the range of tasks our approach is applicable to is not limited by language modeling only. Finally, we propose an experimental framework for evaluating differential privacy of distributed training of language models and show that our approach has good privacy guarantees.
△ Less
Submitted 6 March, 2018; v1 submitted 20 December, 2017;
originally announced December 2017.
-
Hasq Hash Chains
Authors:
Oleg Mazonka,
Vlad Popov
Abstract:
This paper describes a particular hash-based records linking chain scheme. This scheme is simple conceptually and easy to implement in software. It allows for a simple and secure way to transfer ownership of digital objects between peers.
This paper describes a particular hash-based records linking chain scheme. This scheme is simple conceptually and easy to implement in software. It allows for a simple and secure way to transfer ownership of digital objects between peers.
△ Less
Submitted 14 December, 2014;
originally announced December 2014.
-
Faster Fair Solution for the Reader-Writer Problem
Authors:
Vlad Popov,
Oleg Mazonka
Abstract:
A fast fair solution for Reader-Writer Problem is presented.
A fast fair solution for Reader-Writer Problem is presented.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
Arc-preserving subsequences of arc-annotated sequences
Authors:
Vladimir Yu. Popov
Abstract:
Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The longest arc-preserving common subsequence problem has been introduced as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures. We consider the longest arc preserving common subsequence problem.…
▽ More
Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The longest arc-preserving common subsequence problem has been introduced as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures. We consider the longest arc preserving common subsequence problem. In particular, we show that the decision version of the 1-{\sc fragment LAPCS(crossing,chain)} and the decision version of the 0-{\sc diagonal LAPCS(crossing,chain)} are {\bf NP}-complete for some fixed alphabet $Σ$ such that $|Σ| = 2$. Also we show that if $|Σ| = 1$, then the decision version of the 1-{\sc fragment LAPCS(unlimited, plain)} and the decision version of the 0-{\sc diagonal LAPCS(unlimited, plain)} are {\bf NP}-complete.
△ Less
Submitted 22 April, 2011;
originally announced April 2011.
-
Integration of Flexible Web Based GUI in I-SOAS
Authors:
Zeeshan Ahmed,
Vasil Popov
Abstract:
It is necessary to improve the concepts of the present web based graphical user interface for the development of more flexible and intelligent interface to provide ease and increase the level of comfort at user end like most of the desktop based applications. This research is conducted targeting the goal of implementing flexible GUI consisting of a visual component manager with different component…
▽ More
It is necessary to improve the concepts of the present web based graphical user interface for the development of more flexible and intelligent interface to provide ease and increase the level of comfort at user end like most of the desktop based applications. This research is conducted targeting the goal of implementing flexible GUI consisting of a visual component manager with different components by functionality, design and purpose. In this research paper we present a Rich Internet Application (RIA) based graphical user interface for web based product development, and going into the details we present a comparison between existing RIA Technologies, adopted methodology in the GUI development and developed prototype.
△ Less
Submitted 14 November, 2010;
originally announced November 2010.