-
Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
Authors:
Yang Zhang,
Teoh Tze Tzun,
Lim Wei Hern,
Tiviatis Sim,
Kenji Kawaguchi
Abstract:
Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus…
▽ More
Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended semantics of the associated text prompts. We examine cross-attention layers in diffusion models and observe a propensity for these layers to disproportionately focus on certain tokens during the generation process, thereby undermining semantic fidelity. To address the issue of dominant attention, we introduce attention regulation, a computation-efficient on-the-fly optimization approach at inference time to align attention maps with the input text prompt. Notably, our method requires no additional training or fine-tuning and serves as a plug-in module on a model. Hence, the generation capacity of the original model is fully preserved. We compare our approach with alternative approaches across various datasets, evaluation metrics, and diffusion models. Experiment results show that our method consistently outperforms other baselines, yielding images that more faithfully reflect the desired concepts with reduced computation overhead. Code is available at https://github.com/YaNgZhAnG-V5/attention_regulation.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
IEEE BigData 2023 Keystroke Verification Challenge (KVC)
Authors:
Giuseppe Stragapede,
Ruben Vera-Rodriguez,
Ruben Tolosana,
Aythami Morales,
Ivan DeAndres-Tame,
Naser Damer,
Julian Fierrez,
Javier-Ortega Garcia,
Nahuel Gonzalez,
Andrei Shadrikov,
Dmitrii Gordin,
Leon Schmitt,
Daniel Wimmer,
Christoph Grossmann,
Joerdis Krieger,
Florian Heinz,
Ron Krestel,
Christoffer Mayer,
Simon Haberl,
Helena Gschrey,
Yosuke Yamagishi,
Sanjay Saha,
Sanka Rasnayaka,
Sandareka Wickramanayake,
Terence Sim
, et al. (4 additional authors not shown)
Abstract:
This paper describes the results of the IEEE BigData 2023 Keystroke Verification Challenge (KVC), that considers the biometric verification performance of Keystroke Dynamics (KD), captured as tweet-long sequences of variable transcript text from over 185,000 subjects. The data are obtained from two of the largest public databases of KD up to date, the Aalto Desktop and Mobile Keystroke Databases,…
▽ More
This paper describes the results of the IEEE BigData 2023 Keystroke Verification Challenge (KVC), that considers the biometric verification performance of Keystroke Dynamics (KD), captured as tweet-long sequences of variable transcript text from over 185,000 subjects. The data are obtained from two of the largest public databases of KD up to date, the Aalto Desktop and Mobile Keystroke Databases, guaranteeing a minimum amount of data per subject, age and gender annotations, absence of corrupted data, and avoiding excessively unbalanced subject distributions with respect to the considered demographic attributes. Several neural architectures were proposed by the participants, leading to global Equal Error Rates (EERs) as low as 3.33% and 3.61% achieved by the best team respectively in the desktop and mobile scenario, outperforming the current state of the art biometric verification performance for KD. Hosted on CodaLab, the KVC will be made ongoing to represent a useful tool for the research community to compare different approaches under the same experimental conditions and to deepen the knowledge of the field.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Undercover Deepfakes: Detecting Fake Segments in Videos
Authors:
Sanjay Saha,
Rashindrie Perera,
Sachith Seneviratne,
Tamasha Malepathirana,
Sanka Rasnayaka,
Deshani Geethika,
Terence Sim,
Saman Halgamuge
Abstract:
The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such…
▽ More
The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of the deepfake generation, this is a key societal issue. In particular, the ability to modify segments of videos using such generative techniques creates a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth. This paradigm has been under-explored by the current deepfake detection methods in the academic literature. In this paper, we present a deepfake detection method that can address this issue by performing deepfake prediction at the frame and video levels. To facilitate testing our method, we prepared a new benchmark dataset where videos have both real and fake frame sequences with very subtle transitions. We provide a benchmark on the proposed dataset with our detection method which utilizes the Vision Transformer based on Scaling and Shifting to learn spatial features, and a Timeseries Transformer to learn temporal features of the videos to help facilitate the interpretation of possible deepfakes. Extensive experiments on a variety of deepfake generation methods show excellent results by the proposed method on temporal segmentation and classical video-level predictions as well. In particular, the paradigm we address will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes. All experiments can be reproduced at: github.com/rgb91/temporal-deepfake-segmentation.
△ Less
Submitted 24 August, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Is Face Recognition Safe from Realizable Attacks?
Authors:
Sanjay Saha,
Terence Sim
Abstract:
Face recognition is a popular form of biometric authentication and due to its widespread use, attacks have become more common as well. Recent studies show that Face Recognition Systems are vulnerable to attacks and can lead to erroneous identification of faces. Interestingly, most of these attacks are white-box, or they are manipulating facial images in ways that are not physically realizable. In…
▽ More
Face recognition is a popular form of biometric authentication and due to its widespread use, attacks have become more common as well. Recent studies show that Face Recognition Systems are vulnerable to attacks and can lead to erroneous identification of faces. Interestingly, most of these attacks are white-box, or they are manipulating facial images in ways that are not physically realizable. In this paper, we propose an attack scheme where the attacker can generate realistic synthesized face images with subtle perturbations and physically realize that onto his face to attack black-box face recognition systems. Comprehensive experiments and analyses show that subtle perturbations realized on attackers face can create successful attacks on state-of-the-art face recognition systems in black-box settings. Our study exposes the underlying vulnerability posed by the Face Recognition Systems against realizable black-box attacks.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Contrastive predictive coding for Anomaly Detection in Multi-variate Time Series Data
Authors:
Theivendiram Pranavan,
Terence Sim,
Arulmurugan Ambikapathi,
Savitha Ramasamy
Abstract:
Anomaly detection in multi-variate time series (MVTS) data is a huge challenge as it requires simultaneous representation of long term temporal dependencies and correlations across multiple variables. More often, this is solved by breaking the complexity through modeling one dependency at a time. In this paper, we propose a Time-series Representational Learning through Contrastive Predictive Codin…
▽ More
Anomaly detection in multi-variate time series (MVTS) data is a huge challenge as it requires simultaneous representation of long term temporal dependencies and correlations across multiple variables. More often, this is solved by breaking the complexity through modeling one dependency at a time. In this paper, we propose a Time-series Representational Learning through Contrastive Predictive Coding (TRL-CPC) towards anomaly detection in MVTS data. First, we jointly optimize an encoder, an auto-regressor and a non-linear transformation function to effectively learn the representations of the MVTS data sets, for predicting future trends. It must be noted that the context vectors are representative of the observation window in the MTVS. Next, the latent representations for the succeeding instants obtained through non-linear transformations of these context vectors, are contrasted with the latent representations of the encoder for the multi-variables such that the density for the positive pair is maximized. Thus, the TRL-CPC helps to model the temporal dependencies and the correlations of the parameters for a healthy signal pattern. Finally, fitting the latent representations are fit into a Gaussian scoring function to detect anomalies. Evaluation of the proposed TRL-CPC on three MVTS data sets against SOTA anomaly detection methods shows the superiority of TRL-CPC.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
General-order observation-driven models: ergodicity and consistency of the maximum likelihood estimator
Authors:
Tepmony Sim,
Randal Douc,
François Roueff
Abstract:
The class of observation-driven models (ODMs) includes many models of non-linear time series which, in a fashion similar to, yet different from, hidden Markov models (HMMs), involve hidden variables. Interestingly, in contrast to most HMMs, ODMs enjoy likelihoods that can be computed exactly with computational complexity of the same order as the number of observations, making maximum likelihood es…
▽ More
The class of observation-driven models (ODMs) includes many models of non-linear time series which, in a fashion similar to, yet different from, hidden Markov models (HMMs), involve hidden variables. Interestingly, in contrast to most HMMs, ODMs enjoy likelihoods that can be computed exactly with computational complexity of the same order as the number of observations, making maximum likelihood estimation the privileged approach for statistical inference for these models. A celebrated example of general order ODMs is the GARCH$(p,q)$ model, for which ergodicity and inference has been studied extensively. However little is known on more general models, in particular integer-valued ones, such as the log-linear Poisson GARCH or the NBIN-GARCH of order $(p,q)$ about which most of the existing results seem restricted to the case $p=q=1$. Here we fill this gap and derive ergodicity conditions for general ODMs. The consistency and the asymptotic normality of the maximum likelihood estimator (MLE) can then be derived using the method already developed for first order ODMs.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
Necessary and sufficient conditions for the identifiability of observation-driven models
Authors:
François Roueff,
Randal Douc,
Ois Roueff,
Tepmony Sim
Abstract:
In this contribution we are interested in proving that a given observation-driven model is identifiable. In the case of a GARCH(p, q) model, a simple sufficient condition has been established in [1] for showing the consistency of the quasi-maximum likelihood estimator. It turns out that this condition applies for a much larger class of observation-driven models, that we call the class of linearly…
▽ More
In this contribution we are interested in proving that a given observation-driven model is identifiable. In the case of a GARCH(p, q) model, a simple sufficient condition has been established in [1] for showing the consistency of the quasi-maximum likelihood estimator. It turns out that this condition applies for a much larger class of observation-driven models, that we call the class of linearly observation-driven models. This class includes standard integer valued observation-driven time series, such as the log-linear Poisson GARCH or the NBIN-GARCH models.
△ Less
Submitted 12 May, 2020; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing
Authors:
Jian Zhao,
Jianshu Li,
Yu Cheng,
Li Zhou,
Terence Sim,
Shuicheng Yan,
Jiashi Feng
Abstract:
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification and autonomous driving, etc. To this end, models need to comprehensively perceive the semantic information and the differences between insta…
▽ More
Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification and autonomous driving, etc. To this end, models need to comprehensively perceive the semantic information and the differences between instances in a multi-human image, which is recently defined as the multi-human parsing task. In this paper, we present a new large-scale database "Multi-Human Parsing (MHP)" for algorithm development and evaluation, and advances the state-of-the-art in understanding humans in crowded scenes. MHP contains 25,403 elaborately annotated images with 58 fine-grained semantic category labels, involving 2-26 persons per image and captured in real-world scenes from various viewpoints, poses, occlusion, interactions and background. We further propose a novel deep Nested Adversarial Network (NAN) model for multi-human parsing. NAN consists of three Generative Adversarial Network (GAN)-like sub-nets, respectively performing semantic saliency prediction, instance-agnostic parsing and instance-aware clustering. These sub-nets form a nested structure and are carefully designed to learn jointly in an end-to-end way. NAN consistently outperforms existing state-of-the-art solutions on our MHP and several other datasets, and serves as a strong baseline to drive the future research for multi-human parsing.
△ Less
Submitted 6 July, 2018; v1 submitted 9 April, 2018;
originally announced April 2018.
-
Integrated Face Analytics Networks through Cross-Dataset Hybrid Training
Authors:
Jianshu Li,
Shengtao Xiao,
Fang Zhao,
Jian Zhao,
Jianan Li,
Jiashi Feng,
Shuicheng Yan,
Terence Sim
Abstract:
Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics w…
▽ More
Face analytics benefits many multimedia applications. It consists of a number of tasks, such as facial emotion recognition and face parsing, and most existing approaches generally treat these tasks independently, which limits their deployment in real scenarios. In this paper we propose an integrated Face Analytics Network (iFAN), which is able to perform multiple tasks jointly for face analytics with a novel carefully designed network architecture to fully facilitate the informative interaction among different tasks. The proposed integrated network explicitly models the interactions between tasks so that the correlations between tasks can be fully exploited for performance boost. In addition, to solve the bottleneck of the absence of datasets with comprehensive training data for various tasks, we propose a novel cross-dataset hybrid training strategy. It allows "plug-in and play" of multiple datasets annotated for different tasks without the requirement of a fully labeled common dataset for all the tasks. We experimentally show that the proposed iFAN achieves state-of-the-art performance on multiple face analytics tasks using a single integrated model. Specifically, iFAN achieves an overall F-score of 91.15% on the Helen dataset for face parsing, a normalized mean error of 5.81% on the MTFL dataset for facial landmark localization and an accuracy of 45.73% on the BNU dataset for emotion recognition with a single model.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Multiple-Human Parsing in the Wild
Authors:
Jianshu Li,
Jian Zhao,
Yunchao Wei,
Congyan Lang,
Yidong Li,
Terence Sim,
Shuicheng Yan,
Jiashi Feng
Abstract:
Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-h…
▽ More
Human parsing is attracting increasing research attention. In this work, we aim to push the frontier of human parsing by introducing the problem of multi-human parsing in the wild. Existing works on human parsing mainly tackle single-person scenarios, which deviates from real-world applications where multiple persons are present simultaneously with interaction and occlusion. To address the multi-human parsing problem, we introduce a new multi-human parsing (MHP) dataset and a novel multi-human parsing model named MH-Parser. The MHP dataset contains multiple persons captured in real-world scenes with pixel-level fine-grained semantic annotations in an instance-aware setting. The MH-Parser generates global parsing maps and person instance masks simultaneously in a bottom-up fashion with the help of a new Graph-GAN model. We envision that the MHP dataset will serve as a valuable data resource to develop new multi-human parsing models, and the MH-Parser offers a strong baseline to drive future research for multi-human parsing in the wild.
△ Less
Submitted 14 March, 2018; v1 submitted 19 May, 2017;
originally announced May 2017.
-
The maximizing set of the asymptotic normalized log-likelihood for partially observed Markov chains
Authors:
Randal Douc,
Francois Roueff,
Tepmony Sim
Abstract:
This paper deals with a parametrized family of partially observed bivariate Markov chains. We establish that, under very mild assumptions, the limit of the normalized log-likelihood function is maximized when the parameters belong to the equivalence class of the true parameter, which is a key feature for obtaining the consistency of the maximum likelihood estimators (MLEs) in well-specified models…
▽ More
This paper deals with a parametrized family of partially observed bivariate Markov chains. We establish that, under very mild assumptions, the limit of the normalized log-likelihood function is maximized when the parameters belong to the equivalence class of the true parameter, which is a key feature for obtaining the consistency of the maximum likelihood estimators (MLEs) in well-specified models. This result is obtained in the general framework of partially dominated models. We examine two specific cases of interest, namely, hidden Markov models (HMMs) and observation-driven time series models. In contrast with previous approaches, the identifiability is addressed by relying on the uniqueness of the invariant distribution of the Markov chain associated to the complete data, regardless its rate of convergence to the equilibrium.
△ Less
Submitted 30 September, 2015;
originally announced September 2015.
-
Eye-2-I: Eye-tracking for just-in-time implicit user profiling
Authors:
Keng-Teck Ma,
Qianli Xu,
Liyuan Li,
Terence Sim,
Mohan Kankanhalli,
Rosary Lim
Abstract:
For many applications, such as targeted advertising and content recommendation, knowing users' traits and interests is a prerequisite. User profiling is a helpful approach for this purpose. However, current methods, i.e. self-reporting, web-activity monitoring and social media mining are either intrusive or require data over long periods of time. Recently, there is growing evidence in cognitive sc…
▽ More
For many applications, such as targeted advertising and content recommendation, knowing users' traits and interests is a prerequisite. User profiling is a helpful approach for this purpose. However, current methods, i.e. self-reporting, web-activity monitoring and social media mining are either intrusive or require data over long periods of time. Recently, there is growing evidence in cognitive science that a variety of users' profile is significantly correlated with eye-tracking data. We propose a novel just-in-time implicit profiling method, Eye-2-I, which learns the user's interests, demographic and personality traits from the eye-tracking data while the user is watching videos. Although seemingly conspicuous by closely monitoring the user's eye behaviors, our method is unobtrusive and privacy-preserving owing to its unique characteristics, including (1) fast speed - the profile is available by the first video shot, typically few seconds, and (2) self-contained - not relying on historical data or functional modules. [Bug found. As a proof-of-concept, our method is evaluated in a user study with 51 subjects. It achieved a mean accuracy of 0.89 on 37 attributes of user profile with 9 minutes of eye-tracking data.]
△ Less
Submitted 13 April, 2016; v1 submitted 15 July, 2015;
originally announced July 2015.
-
Handy sufficient conditions for the convergence of the maximum likelihood estimator in observation-driven models
Authors:
Randal Douc,
François Roueff,
Tepmony Sim
Abstract:
This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained cond…
▽ More
This paper generalizes asymptotic properties obtained in the observation-driven times series models considered by \cite{dou:kou:mou:2013} in the sense that the conditional law of each observation is also permitted to depend on the parameter. The existence of ergodic solutions and the consistency of the Maximum Likelihood Estimator (MLE) are derived under easy-to-check conditions. The obtained conditions appear to apply for a wide class of models. We illustrate our results with specific observation-driven times series, including the recently introduced NBIN-GARCH and NM-GARCH models, demonstrating the consistency of the MLE for these two models.
△ Less
Submitted 5 June, 2015;
originally announced June 2015.
-
Correlation Filters with Limited Boundaries
Authors:
Hamed Kiani Galoogahi,
Terence Sim,
Simon Lucey
Abstract:
Correlation filters take advantage of specific properties in the Fourier domain allowing them to be estimated efficiently: O(NDlogD) in the frequency domain, versus O(D^3 + ND^2) spatially where D is signal length, and N is the number of signals. Recent extensions to correlation filters, such as MOSSE, have reignited interest of their use in the vision community due to their robustness and attract…
▽ More
Correlation filters take advantage of specific properties in the Fourier domain allowing them to be estimated efficiently: O(NDlogD) in the frequency domain, versus O(D^3 + ND^2) spatially where D is signal length, and N is the number of signals. Recent extensions to correlation filters, such as MOSSE, have reignited interest of their use in the vision community due to their robustness and attractive computational properties. In this paper we demonstrate, however, that this computational efficiency comes at a cost. Specifically, we demonstrate that only 1/D proportion of shifted examples are unaffected by boundary effects which has a dramatic effect on detection/tracking performance. In this paper, we propose a novel approach to correlation filter estimation that: (i) takes advantage of inherent computational redundancies in the frequency domain, and (ii) dramatically reduces boundary effects. Impressive object tracking and detection results are presented in terms of both accuracy and computational efficiency.
△ Less
Submitted 31 March, 2014;
originally announced March 2014.