-
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Authors:
Abhimanyu Hans,
Yuxin Wen,
Neel Jain,
John Kirchenbauer,
Hamid Kazemi,
Prajwal Singhania,
Siddharth Singh,
Gowthami Somepalli,
Jonas Gei**,
Abhinav Bhatele,
Tom Goldstein
Abstract:
Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verba…
▽ More
Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
Authors:
Hossein Souri,
Arpit Bansal,
Hamid Kazemi,
Liam Fowl,
Aniruddha Saha,
Jonas Gei**,
Andrew Gordon Wilson,
Rama Chellappa,
Tom Goldstein,
Micah Goldblum
Abstract:
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea…
▽ More
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
What do we learn from inverting CLIP models?
Authors:
Hamid Kazemi,
Atoosa Chegini,
Jonas Gei**,
Soheil Feizi,
Tom Goldstein
Abstract:
We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe insta…
▽ More
We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe instances of NSFW (Not Safe For Work) images during model inversion. This phenomenon occurs even for semantically innocuous prompts, like "a beautiful landscape," as well as for prompts involving the names of celebrities.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Authors:
Abhimanyu Hans,
Avi Schwarzschild,
Valeriia Cherepanova,
Hamid Kazemi,
Aniruddha Saha,
Micah Goldblum,
Jonas Gei**,
Tom Goldstein
Abstract:
Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple ca…
▽ More
Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0.01%, despite not being trained on any ChatGPT data.
△ Less
Submitted 1 July, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
Where Quantum Complexity Helps Classical Complexity
Authors:
Arash Vaezi,
Seyed Mohammad Hussein Kazemi,
Negin Bagheri Noghrehy,
Seyed Mohsen Kazemi,
Ali Movaghar,
Mohammad Ghodsi
Abstract:
Scientists have demonstrated that quantum computing has presented novel approaches to address computational challenges, each varying in complexity. Adapting problem-solving strategies is crucial to harness the full potential of quantum computing. Nonetheless, there are defined boundaries to the capabilities of quantum computing. This paper concentrates on aggregating prior research efforts dedicat…
▽ More
Scientists have demonstrated that quantum computing has presented novel approaches to address computational challenges, each varying in complexity. Adapting problem-solving strategies is crucial to harness the full potential of quantum computing. Nonetheless, there are defined boundaries to the capabilities of quantum computing. This paper concentrates on aggregating prior research efforts dedicated to solving intricate classical computational problems through quantum computing. The objective is to systematically compile an exhaustive inventory of these solutions and categorize a collection of demanding problems that await further exploration.
△ Less
Submitted 13 January, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
GPT Models in Construction Industry: Opportunities, Limitations, and a Use Case Validation
Authors:
Abdullahi Saka,
Ridwan Taiwo,
Nurudeen Saka,
Babatunde Salami,
Saheed Ajayi,
Kabiru Akande,
Hadi Kazemi
Abstract:
Large Language Models(LLMs) trained on large data sets came into prominence in 2018 after Google introduced BERT. Subsequently, different LLMs such as GPT models from OpenAI have been released. These models perform well on diverse tasks and have been gaining widespread applications in fields such as business and education. However, little is known about the opportunities and challenges of using LL…
▽ More
Large Language Models(LLMs) trained on large data sets came into prominence in 2018 after Google introduced BERT. Subsequently, different LLMs such as GPT models from OpenAI have been released. These models perform well on diverse tasks and have been gaining widespread applications in fields such as business and education. However, little is known about the opportunities and challenges of using LLMs in the construction industry. Thus, this study aims to assess GPT models in the construction industry. A critical review, expert discussion and case study validation are employed to achieve the study objectives. The findings revealed opportunities for GPT models throughout the project lifecycle. The challenges of leveraging GPT models are highlighted and a use case prototype is developed for materials selection and optimization. The findings of the study would be of benefit to researchers, practitioners and stakeholders, as it presents research vistas for LLMs in the construction industry.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Design and Optimisation of High-Speed Receivers for 6G Optical Wireless Networks
Authors:
Elham Sarbazi,
Hossein Kazemi,
Michael Crisp,
Taisir El-Gorashi,
Jaafar Elmirghani,
Richard Penty,
Ian White,
Majid Safari,
Harald Haas
Abstract:
To achieve multi-Gb/s data rates in 6G optical wireless access networks based on narrow infrared (IR) laser beams, a high-speed receiver with two key specifications is needed: a sufficiently large aperture to collect the required optical power and a wide field of view (FOV) to avoid strict alignment issues. This paper puts forward the systematic design and optimisation of multi-tier non-imaging an…
▽ More
To achieve multi-Gb/s data rates in 6G optical wireless access networks based on narrow infrared (IR) laser beams, a high-speed receiver with two key specifications is needed: a sufficiently large aperture to collect the required optical power and a wide field of view (FOV) to avoid strict alignment issues. This paper puts forward the systematic design and optimisation of multi-tier non-imaging angle diversity receivers (ADRs) composed of compound parabolic concentrators (CPCs) coupled with photodiode (PD) arrays for laser-based optical wireless communication (OWC) links. Design tradeoffs include the gain-FOV tradeoff for each receiver element and the area-bandwidth tradeoff for each PD array. The rate maximisation is formulated as a non-convex optimisation problem under the constraints on the minimum required FOV and the overall ADR dimensions to find optimum configuration of the receiver bandwidth and FOV, and a low-complexity optimal solution is proposed. The ADR performance is studied using computer simulations and insightful design guidelines are provided through various numerical examples. An efficient technique is also proposed to reduce the ADR dimensions based on CPC length truncation. It is shown that a compact ADR with a height of $\leq0.5$ cm and an effective area of $\leq0.5$ cm$^2$ reaches a data rate of $12$ Gb/s with a half-angle FOV of $30^\circ$ over a $3$ m link distance.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
What do Vision Transformers Learn? A Visual Exploration
Authors:
Amin Ghiasi,
Hamid Kazemi,
Eitan Borgnia,
Steven Reich,
Manli Shu,
Micah Goldblum,
Andrew Gordon Wilson,
Tom Goldstein
Abstract:
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assiste…
▽ More
Vision transformers (ViTs) are quickly becoming the de-facto architecture for computer vision, yet we understand very little about why they work and what they learn. While existing studies visually analyze the mechanisms of convolutional neural networks, an analogous exploration of ViTs remains challenging. In this paper, we first address the obstacles to performing visualizations on ViTs. Assisted by these solutions, we observe that neurons in ViTs trained with language model supervision (e.g., CLIP) are activated by semantic concepts rather than visual features. We also explore the underlying differences between ViTs and CNNs, and we find that transformers detect image background features, just like their convolutional counterparts, but their predictions depend far less on high-frequency information. On the other hand, both architecture types behave similarly in the way features progress from abstract patterns in early layers to concrete objects in late layers. In addition, we show that ViTs maintain spatial information in all layers except the final layer. In contrast to previous works, we show that the last layer most likely discards the spatial information and behaves as a learned global pooling operation. Finally, we conduct large-scale visualizations on a wide range of ViT variants, including DeiT, CoaT, ConViT, PiT, Swin, and Twin, to validate the effectiveness of our method.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries
Authors:
Yuxin Wen,
Arpit Bansal,
Hamid Kazemi,
Eitan Borgnia,
Micah Goldblum,
Jonas Gei**,
Tom Goldstein
Abstract:
As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods on…
▽ More
As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings. Code is available at https://github.com/YuxinWenRick/canary-in-a-coalmine.
△ Less
Submitted 1 June, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Trajectory Range Visibility
Authors:
Seyed Mohammad Hussein Kazemi,
Arash Vaezi,
Mohammad Ali Abam,
Mohammad Ghodsi
Abstract:
Consider two entities with constant but not necessarily equal velocities, moving on two given piece-wise linear trajectories inside a simple polygon $P$. The Trajectory Range Visibility problem deals with determining the sub-trajectories on which two entities become visible to each other. A more straightforward decision version of this problem is called Trajectory Visibility, where the trajectorie…
▽ More
Consider two entities with constant but not necessarily equal velocities, moving on two given piece-wise linear trajectories inside a simple polygon $P$. The Trajectory Range Visibility problem deals with determining the sub-trajectories on which two entities become visible to each other. A more straightforward decision version of this problem is called Trajectory Visibility, where the trajectories are line segments. The decision version specifies whether the entities can see one another. This version was studied by P. Eades et al. in 2020, where they supposed given constant velocities for the entities. However, the approach presented in this paper supports non-constant complexity trajectories. Furthermore, we report every pair of constant velocities with which the entities can see each other. In particular, for every constant velocity of a moving entity, we specify: $(1)$ All visible parts of the other entity's trajectory. $(2)$ All possible constant velocities of the other entity to become visible.
Regarding line-segment trajectories, we present $\mathcal{O}(n \log n)$ running time algorithm which obtains all pairs of sub-trajectories on which the moving entities become visible to one another, where $n$ is the complexity of $P$. Regarding the general case, we provide an algorithm with $\mathcal{O}(n \log n + m(\log m + \log n))$ running time, where $m$ indicates the complexity of both trajectories. We offer $\mathcal{O}(\log n)$ query time for line segment trajectories and $\mathcal{O}(\log m + k)$ for the non-constant complexity ones s.t. $k$ is the number of velocity ranges reported in the output. Interestingly, our results require only $\mathcal{O}(n + m)$ space for non-constant complexity trajectories.
△ Less
Submitted 27 February, 2023; v1 submitted 29 August, 2022;
originally announced September 2022.
-
Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise
Authors:
Arpit Bansal,
Eitan Borgnia,
Hong-Min Chu,
Jie S. Li,
Hamid Kazemi,
Furong Huang,
Micah Goldblum,
Jonas Gei**,
Tom Goldstein
Abstract:
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministi…
▽ More
Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e.g., blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference, and paves the way for generalized diffusion models that invert arbitrary processes. Our code is available at https://github.com/arpitbansal297/Cold-Diffusion-Models
△ Less
Submitted 19 August, 2022;
originally announced August 2022.
-
Terabit Indoor Laser-Based Wireless Communications: LiFi 2.0 for 6G
Authors:
Mohammad Dehghani Soltani,
Hossein Kazemi,
Elham Sarbazi,
Ahmad Adnan Qidan,
Barzan Yosuf,
Sanaa Mohamed,
Ravinder Singh,
Bela Berde,
Dominique Chiaroni,
Bastien Béchadergue,
Fathi Abdeldayem,
Hardik Soni,
Jose Tabu,
Micheline Perrufel,
Nikola Serafimovski,
Taisir E. H. El-Gorashi,
Jaafar Elmirghani,
Richard Penty,
Ian H. White,
Harald Haas,
Majid Safari
Abstract:
This paper provides a summary of available technologies required for implementing indoor laser-based wireless networks capable of achieving aggregate data-rates of terabits per second as widely accepted as a sixth generation (6G) key performance indicator. The main focus of this paper is on the technologies supporting the near infrared region of the optical spectrum. The main challenges in the des…
▽ More
This paper provides a summary of available technologies required for implementing indoor laser-based wireless networks capable of achieving aggregate data-rates of terabits per second as widely accepted as a sixth generation (6G) key performance indicator. The main focus of this paper is on the technologies supporting the near infrared region of the optical spectrum. The main challenges in the design of the transmitter and receiver systems and communication/networking schemes are identified and new insights are provided. This paper also covers the previous and recent standards as well as industrial applications for optical wireless communications (OWC) and LiFi.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
High-Speed Imaging Receiver Design for 6G Optical Wireless Communications: A Rate-FOV Trade-Off
Authors:
Mohammad Dehghani Soltani,
Hossein Kazemi,
Elham Sarbazi,
Taisir E. H. El-Gorashi,
Jaafar M. H. Elmirghani,
Richard V. Penty,
Ian H. White,
Harald Haas,
Majid Safari
Abstract:
The design of a compact high-speed and wide field of view (FOV) receiver is challenging due to the presence of two well-known trade-offs. The first one is the area-bandwidth trade-off of photodetectors (PDs) and the second one is the gain-FOV trade-off due to the use of optics. The combined effects of these two trade-offs imply that the achievable data rate of an imaging optical receiver is limite…
▽ More
The design of a compact high-speed and wide field of view (FOV) receiver is challenging due to the presence of two well-known trade-offs. The first one is the area-bandwidth trade-off of photodetectors (PDs) and the second one is the gain-FOV trade-off due to the use of optics. The combined effects of these two trade-offs imply that the achievable data rate of an imaging optical receiver is limited by its FOV, i.e., a rate-FOV trade-off. To control the area-bandwidth trade-off, an array of small PDs can be used instead of a single PD. Moreover, in practice, a large-area lens is required to ensure sufficient power collection, which in turn limits the receiver FOV (i.e., gain-FOV trade-off). We propose an imaging receiver design in the form of an array of arrays. To achieve a reasonable receiver FOV, we use individual focusing lens for each PD array rather than a single collection lens for the whole receiver. The proposed array of arrays structure provides an effective method to control both gain-FOV trade-off (via an array of lenses) and area-bandwidth trade-off (via arrays of PDs). We first derive a tractable analytical model for the SNR of an array of PDs where the maximum ratio combining has been employed. Then, we extend the model for the proposed array of arrays structure and the accuracy of the analytical model is verified based on several Optic Studio-based simulations. Next, we formulate an optimization problem to maximize the achievable data rate of the imaging receiver subject to a minimum required FOV. The optimization problem is solved for two commonly used modulation techniques, namely, OOK and direct current biased optical orthogonal frequency division multiplexing with variable rate quadrature amplitude modulation. It is demonstrated that a data rate of ~ 24 Gbps with a FOV of 15 is achievable using OOK with a total receiver size of 2 cm by 2 cm.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations
Authors:
Amin Ghiasi,
Hamid Kazemi,
Steven Reich,
Chen Zhu,
Micah Goldblum,
Tom Goldstein
Abstract:
Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmen…
▽ More
Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.
△ Less
Submitted 30 January, 2022;
originally announced January 2022.
-
A Tb/s Indoor MIMO Optical Wireless Backhaul System Using VCSEL Arrays
Authors:
Hossein Kazemi,
Elham Sarbazi,
Mohammad Dehghani Soltani,
Taisir E. H. El-Gorashi,
Jaafar M. H. Elmirghani,
Richard V. Penty,
Ian H. White,
Majid Safari,
Harald Haas
Abstract:
In this paper, the design of a multiple-input multiple-output (MIMO) optical wireless communication (OWC) link based on vertical cavity surface emitting laser (VCSEL) arrays is systematically carried out with the aim to support data rates in excess of 1 Tb/s for the backhaul of sixth generation (6G) indoor wireless networks. The proposed design combines direct current optical orthogonal frequency…
▽ More
In this paper, the design of a multiple-input multiple-output (MIMO) optical wireless communication (OWC) link based on vertical cavity surface emitting laser (VCSEL) arrays is systematically carried out with the aim to support data rates in excess of 1 Tb/s for the backhaul of sixth generation (6G) indoor wireless networks. The proposed design combines direct current optical orthogonal frequency division multiplexing (DCO-OFDM) and a spatial multiplexing MIMO architecture. For such an ultra-high-speed line-of-sight (LOS) OWC link with low divergence laser beams, maintaining alignment is of high importance. In this paper, two types of misalignment error between the transmitter and receiver are distinguished, namely, radial displacement error and orientation angle error, and they are thoroughly modeled in a unified analytical framework assuming Gaussian laser beams, resulting in a generalized misalignment model (GMM). The derived GMM is then extended to MIMO arrays and the performance of the MIMO-OFDM OWC system is analyzed in terms of the aggregate data rate. Novel insights are provided into the system performance based on computer simulations by studying various influential factors such as beam waist, array configuration and different misalignment errors, which can be used as guidelines for designing short range Tb/s MIMO OWC systems.
△ Less
Submitted 4 April, 2022; v1 submitted 19 February, 2021;
originally announced February 2021.
-
Safety Analysis for Laser-based Optical Wireless Communications: A Tutorial
Authors:
Mohammad Dehghani Soltani,
Elham Sarbazi,
Nikolaos Bamiedakis,
Priyanka de Souza,
Hossein Kazemi,
Jaafar M. H. Elmirghani,
Ian H. White,
Richard V. Penty,
Harald Haas,
Majid Safari
Abstract:
Light amplification by stimulated emission of radiation (laser) sources have many advantages for use in high data rate optical wireless communications. In particular, the low cost and high-bandwidth properties of laser sources such as vertical-cavity surface-emitting lasers (VCSELs) make them attractive for future indoor optical wireless communications. In order to be integrated into future indoor…
▽ More
Light amplification by stimulated emission of radiation (laser) sources have many advantages for use in high data rate optical wireless communications. In particular, the low cost and high-bandwidth properties of laser sources such as vertical-cavity surface-emitting lasers (VCSELs) make them attractive for future indoor optical wireless communications. In order to be integrated into future indoor networks, such lasers should conform to eye safety regulations determined by the international electrotechnical commission (IEC) standards for laser safety. In this paper, we provide a detailed study of beam propagation to evaluate the received power of various laser sources, based on which as well as the maximum permissible exposure (MPE) defined by the IEC 60825-1:2014 standard, we establish a comprehensive framework for eye safety analyses. This framework allows us to calculate the maximum allowable transmit power, which is crucial in the design of a reliable and safe laser-based wireless communication system. Initially, we consider a single-mode Gaussian beam and calculate the maximum permissible transmit power. Subsequently, we generalize this approach for higher-mode beams. It is shown that the M-squared-based approach for analysis of multimode lasers ensures the IEC eye safety limits, however, in some scenarios, it can be too conservative compared to the precise beam decomposition method. Laser safety analyses with consideration of optical elements such as lens and diffuser, as well as for VCSEL array have been also presented. Skin safety, as another significant factor of laser safety, has also been investigated in this paper. We have studied the impacts of various parameters such as wavelength, exposure duration and the divergence angle of laser sources on the safety analysis by presenting insightful results.
△ Less
Submitted 5 May, 2021; v1 submitted 17 February, 2021;
originally announced February 2021.
-
Matching Distributions via Optimal Transport for Semi-Supervised Learning
Authors:
Fariborz Taherkhani,
Hadi Kazemi,
Ali Dabouei,
Jeremy Dawson,
Nasser M. Nasrabadi
Abstract:
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data when there is not a sufficient amount of labeled data available over the course of training. SSL methods based on Convolutional Neural Networks (CNNs) have recently provided successful results on standard benchmark tasks such as image classification. In this work, we consider the general se…
▽ More
Semi-Supervised Learning (SSL) approaches have been an influential framework for the usage of unlabeled data when there is not a sufficient amount of labeled data available over the course of training. SSL methods based on Convolutional Neural Networks (CNNs) have recently provided successful results on standard benchmark tasks such as image classification. In this work, we consider the general setting of SSL problem where the labeled and unlabeled data come from the same underlying probability distribution. We propose a new approach that adopts an Optimal Transport (OT) technique serving as a metric of similarity between discrete empirical probability measures to provide pseudo-labels for the unlabeled data, which can then be used in conjunction with the initial labeled data to train the CNN model in an SSL manner. We have evaluated and compared our proposed method with state-of-the-art SSL algorithms on standard datasets to demonstrate the superiority and effectiveness of our SSL algorithm.
△ Less
Submitted 21 October, 2021; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Quality Guided Sketch-to-Photo Image Synthesis
Authors:
Uche Osahor,
Hadi Kazemi,
Ali Dabouei,
Nasser Nasrabadi
Abstract:
Facial sketches drawn by artists are widely used for visual identification applications and mostly by law enforcement agencies, but the quality of these sketches depend on the ability of the artist to clearly replicate all the key facial features that could aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to impr…
▽ More
Facial sketches drawn by artists are widely used for visual identification applications and mostly by law enforcement agencies, but the quality of these sketches depend on the ability of the artist to clearly replicate all the key facial features that could aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. In this work, we propose a novel approach that adopts a generative adversarial network that synthesizes a single sketch into multiple synthetic images with unique attributes like hair color, sex, etc. We incorporate a hybrid discriminator which performs attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network and an identity preserving network that maintains the identity of the synthesised image throughout the training process. Our approach is aimed at improving the visual appeal of the synthesised images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesised image. We synthesised sketches using XDOG filter for the CelebA, WVU Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results are impressive compared to current state of the art.
△ Less
Submitted 20 April, 2020;
originally announced May 2020.
-
Robust Facial Landmark Detection via Aggregation on Geometrically Manipulated Faces
Authors:
Seyed Mehdi Iranmanesh,
Ali Dabouei,
Sobhan Soleymani,
Hadi Kazemi,
Nasser M. Nasrabadi
Abstract:
In this work, we present a practical approach to the problem of facial landmark detection. The proposed method can deal with large shape and appearance variations under the rich shape deformation. To handle the shape variations we equip our method with the aggregation of manipulated face images. The proposed framework generates different manipulated faces using only one given face image. The appro…
▽ More
In this work, we present a practical approach to the problem of facial landmark detection. The proposed method can deal with large shape and appearance variations under the rich shape deformation. To handle the shape variations we equip our method with the aggregation of manipulated face images. The proposed framework generates different manipulated faces using only one given face image. The approach utilizes the fact that small but carefully crafted geometric manipulation in the input domain can fool deep face recognition models. We propose three different approaches to generate manipulated faces in which two of them perform the manipulations via adversarial attacks and the other one uses known transformations. Aggregating the manipulated faces provides a more robust landmark detection approach which is able to capture more important deformations and variations of the face shapes. Our approach is demonstrated its superiority compared to the state-of-the-art method on benchmark datasets AFLW, 300-W, and COFW.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Multi-material Topology Optimization of Lattice Structures using Geometry Projection
Authors:
Hesaneh Kazemi,
Ashkan Vaziri,
Julian A. Norato
Abstract:
This work presents a computational method for the design of architected truss lattice materials where each strut can be made of one of a set of available materials. We design the lattices to extremize effective properties. As customary in topology optimization, we design a periodic unit cell of the lattice and obtain the effective properties via numerical homogenization. Each bar is represented as…
▽ More
This work presents a computational method for the design of architected truss lattice materials where each strut can be made of one of a set of available materials. We design the lattices to extremize effective properties. As customary in topology optimization, we design a periodic unit cell of the lattice and obtain the effective properties via numerical homogenization. Each bar is represented as a cylindrical offset surface of a medial axis parameterized by the positions of the endpoints of the medial axis. These parameters are smoothly mapped onto a continuous density field for the primal and sensitivity analysis via the geometry projection method. A size variable per material is ascribed to each bar and penalized as in density-based topology optimization to facilitate the entire removal of bars from the design. During the optimization, we allow bars to be made of a mixture of the available materials. However, to ensure each bar is either exclusively made of one material or removed altogether from the optimal design, we impose optimization constraints that ensure each size variable is 0 or 1, and that at most one material size variable is 1. The proposed material interpolation scheme readily accommodates any number of materials. To obtain lattices with desired material symmetries, we design only a reference region of the unit cell and reflect its geometry projection with respect to the appropriate planes of symmetry. Also, to ensure bars remain whole upon reflection inside the unit cell or with respect to the periodic boundaries, we impose a no-cut constraint on the bars. We demonstrate the efficacy of our method via numerical examples of bulk and shear moduli maximization and Poisson's ratio minimization for two- and three-material lattices with cubic symmetry.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Identity-Aware Deep Face Hallucination via Adversarial Face Verification
Authors:
Hadi Kazemi,
Fariborz Taherkhani,
Nasser M. Nasrabadi
Abstract:
In this paper, we address the problem of face hallucination by proposing a novel multi-scale generative adversarial network (GAN) architecture optimized for face verification. First, we propose a multi-scale generator architecture for face hallucination with a high up-scaling ratio factor, which has multiple intermediate outputs at different resolutions. The intermediate outputs have the growing g…
▽ More
In this paper, we address the problem of face hallucination by proposing a novel multi-scale generative adversarial network (GAN) architecture optimized for face verification. First, we propose a multi-scale generator architecture for face hallucination with a high up-scaling ratio factor, which has multiple intermediate outputs at different resolutions. The intermediate outputs have the growing goal of synthesizing small to large images. Second, we incorporate a face verifier with the original GAN discriminator and propose a novel discriminator which learns to discriminate different identities while distinguishing fake generated HR face images from their ground truth images. In particular, the learned generator cares for not only the visual quality of hallucinated face images but also preserving the discriminative features in the hallucination process. In addition, to capture perceptually relevant differences we employ a perceptual similarity loss, instead of similarity in pixel space. We perform a quantitative and qualitative evaluation of our framework on the LFW and CelebA datasets. The experimental results show the advantages of our proposed method against the state-of-the-art methods on the 8x downsampled testing dataset.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Multi-Hop Wireless Optical Backhauling for LiFi Attocell Networks: Bandwidth Scheduling and Power Control
Authors:
Hossein Kazemi,
Majid Safari,
Harald Haas
Abstract:
The backhaul of hundreds of light fidelity (LiFi) base stations (BSs) constitutes a major challenge. Indoor wireless optical backhauling is a novel approach whereby the interconnections between adjacent LiFi BSs are provided by way of directed line-of-sight (LOS) wireless infrared (IR) links. Building on the aforesaid approach, this paper presents the top-down design of a multi-hop wireless backha…
▽ More
The backhaul of hundreds of light fidelity (LiFi) base stations (BSs) constitutes a major challenge. Indoor wireless optical backhauling is a novel approach whereby the interconnections between adjacent LiFi BSs are provided by way of directed line-of-sight (LOS) wireless infrared (IR) links. Building on the aforesaid approach, this paper presents the top-down design of a multi-hop wireless backhaul configuration for multi-tier optical attocell networks by proposing the novel idea of super cells. Such cells incorporate multiple clusters of attocells that are connected to the core network via a single gateway based on multi-hop decode-and-forward (DF) relaying. Consequently, new challenges arise for managing the bandwidth and power resources of the bottleneck backhaul. By putting forward user-based bandwidth scheduling (UBS) and cell-based bandwidth scheduling (CBS) policies, the system-level modeling and analysis of the end-to-end multi-user sum rate is elaborated. In addition, optimal bandwidth scheduling under both UBS and CBS policies are formulated as constrained convex optimization problems, which are solved by using the projected subgradient method. Furthermore, the transmission power of the backhaul system is opportunistically reduced by way of an innovative fixed power control (FPC) strategy. The notion of backhaul bottleneck occurrence (BBO) is introduced. An accurate approximate expression of the probability of BBO is derived, and then verified using Monte Carlo simulations. Several insights are provided into the offered gains of the proposed schemes through extensive computer simulations, by studying different aspects of the performance of super cells including the average sum rate, the BBO probability and the backhaul power efficiency (PE).
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
A data-driven proxy to Stoke's flow in porous media
Authors:
Ali Takbiri-Borujeni,
Hadi Kazemi,
Nasser Nasrabadi
Abstract:
The objective for this work is to develop a data-driven proxy to high-fidelity numerical flow simulations using digital images. The proposed model can capture the flow field and permeability in a large verity of digital porous media based on solid grain geometry and pore size distribution by detailed analyses of the local pore geometry and the local flow fields. To develop the model, the detailed…
▽ More
The objective for this work is to develop a data-driven proxy to high-fidelity numerical flow simulations using digital images. The proposed model can capture the flow field and permeability in a large verity of digital porous media based on solid grain geometry and pore size distribution by detailed analyses of the local pore geometry and the local flow fields. To develop the model, the detailed pore space geometry and simulation runs data from 3500 two-dimensional high-fidelity Lattice Boltzmann simulation runs are used to train and to predict the solutions with a high accuracy in much less computational time. The proposed methodology harness the enormous amount of generated data from high-fidelity flow simulations to decode the often under-utilized patterns in simulations and to accurately predict solutions to new cases. The developed model can truly capture the physics of the problem and enhance prediction capabilities of the simulations at a much lower cost. These predictive models, in essence, do not spatio-temporally reduce the order of the problem. They, however, possess the same numerical resolutions as their Lattice Boltzmann simulations equivalents do with the great advantage that their solutions can be achieved by significant reduction in computational costs (speed and memory).
△ Less
Submitted 25 April, 2019;
originally announced May 2019.
-
Unsupervised Image-to-Image Translation Using Domain-Specific Variational Information Bound
Authors:
Hadi Kazemi,
Sobhan Soleymani,
Fariborz Taherkhani,
Seyed Mehdi Iranmanesh,
Nasser M. Nasrabadi
Abstract:
Unsupervised image-to-image translation is a class of computer vision problems which aims at modeling conditional distribution of images in the target domain, given a set of unpaired images in the source and target domains. An image in the source domain might have multiple representations in the target domain. Therefore, ambiguity in modeling of the conditional distribution arises, specially when…
▽ More
Unsupervised image-to-image translation is a class of computer vision problems which aims at modeling conditional distribution of images in the target domain, given a set of unpaired images in the source and target domains. An image in the source domain might have multiple representations in the target domain. Therefore, ambiguity in modeling of the conditional distribution arises, specially when the images in the source and target domains come from different modalities. Current approaches mostly rely on simplifying assumptions to map both domains into a shared-latent space. Consequently, they are only able to model the domain-invariant information between the two modalities. These approaches usually fail to model domain-specific information which has no representation in the target domain. In this work, we propose an unsupervised image-to-image translation framework which maximizes a domain-specific variational information bound and learns the target domain-invariant representation of the two domain. The proposed framework makes it possible to map a single source image into multiple images in the target domain, utilizing several target domain-specific codes sampled randomly from the prior distribution, or extracted from reference images.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Style and Content Disentanglement in Generative Adversarial Networks
Authors:
Hadi Kazemi,
Seyed Mehdi Iranmanesh,
Nasser M. Nasrabadi
Abstract:
Disentangling factors of variation within data has become a very challenging problem for image generation tasks. Current frameworks for training a Generative Adversarial Network (GAN), learn to disentangle the representations of the data in an unsupervised fashion and capture the most significant factors of the data variations. However, these approaches ignore the principle of content and style di…
▽ More
Disentangling factors of variation within data has become a very challenging problem for image generation tasks. Current frameworks for training a Generative Adversarial Network (GAN), learn to disentangle the representations of the data in an unsupervised fashion and capture the most significant factors of the data variations. However, these approaches ignore the principle of content and style disentanglement in image generation, which means their learned latent code may alter the content and style of the generated images at the same time. This paper describes the Style and Content Disentangled GAN (SC-GAN), a new unsupervised algorithm for training GANs that learns disentangled style and content representations of the data. We assume that the representation of an image can be decomposed into a content code that represents the geometrical information of the data, and a style code that captures textural properties. Consequently, by fixing the style portion of the latent representation, we can generate diverse images in a particular style. Reversely, we can set the content code and generate a specific scene in a variety of styles. The proposed SC-GAN has two components: a content code which is the input to the generator, and a style code which modifies the scene style through modification of the Adaptive Instance Normalization (AdaIN) layers' parameters. We evaluate the proposed SC-GAN framework on a set of baseline datasets.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
Unsupervised Facial Geometry Learning for Sketch to Photo Synthesis
Authors:
Hadi Kazemi,
Fariborz Taherkhani,
Nasser M. Nasrabadi
Abstract:
Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry where the goal is to learn the map** between a face sketch image and its corresponding photo-realistic image. However, the limited number of paired sketch-photo training data usually prevents the current frameworks to learn a robust map** between the geometry of sketches and their matchi…
▽ More
Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry where the goal is to learn the map** between a face sketch image and its corresponding photo-realistic image. However, the limited number of paired sketch-photo training data usually prevents the current frameworks to learn a robust map** between the geometry of sketches and their matching photo-realistic images. Consequently, in this work, we present an approach for learning to synthesize a photo-realistic image from a face sketch in an unsupervised fashion. In contrast to current unsupervised image-to-image translation techniques, our framework leverages a novel perceptual discriminator to learn the geometry of human face. Learning facial prior information empowers the network to remove the geometrical artifacts in the face sketch. We demonstrate that a simultaneous optimization of the face photo generator network, employing the proposed perceptual discriminator in combination with a texture-wise discriminator, results in a significant improvement in quality and recognition rate of the synthesized photos. We evaluate the proposed network by conducting extensive experiments on multiple baseline sketch-photo datasets.
△ Less
Submitted 12 October, 2018;
originally announced October 2018.
-
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
Authors:
Sobhan Soleymani,
Ali Dabouei,
Seyed Mehdi Iranmanesh,
Hadi Kazemi,
Jeremy Dawson,
Nasser M. Nasrabadi
Abstract:
In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the…
▽ More
In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to be highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker's voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end-to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross-device speaker verification.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
A Learning-Based Framework for Two-Dimensional Vehicle Maneuver Prediction over V2V Networks
Authors:
Hossein Nourkhiz Mahjoub,
Amin Tahmasbi-Sarvestani,
Hadi Kazemi,
Yaser P. Fallah
Abstract:
Situational awareness in vehicular networks could be substantially improved utilizing reliable trajectory prediction methods. More precise situational awareness, in turn, results in notably better performance of critical safety applications, such as Forward Collision Warning (FCW), as well as comfort applications like Cooperative Adaptive Cruise Control (CACC). Therefore, vehicle trajectory predic…
▽ More
Situational awareness in vehicular networks could be substantially improved utilizing reliable trajectory prediction methods. More precise situational awareness, in turn, results in notably better performance of critical safety applications, such as Forward Collision Warning (FCW), as well as comfort applications like Cooperative Adaptive Cruise Control (CACC). Therefore, vehicle trajectory prediction problem needs to be deeply investigated in order to come up with an end to end framework with enough precision required by the safety applications' controllers. This problem has been tackled in the literature using different methods. However, machine learning, which is a promising and emerging field with remarkable potential for time series prediction, has not been explored enough for this purpose. In this paper, a two-layer neural network-based system is developed which predicts the future values of vehicle parameters, such as velocity, acceleration, and yaw rate, in the first layer and then predicts the two-dimensional, i.e. longitudinal and lateral, trajectory points based on the first layer's outputs. The performance of the proposed framework has been evaluated in realistic cut-in scenarios from Safety Pilot Model Deployment (SPMD) dataset and the results show a noticeable improvement in the prediction accuracy in comparison with the kinematics model which is the dominant employed model by the automotive industry. Both ideal and nonideal communication circumstances have been investigated for our system evaluation. For non-ideal case, an estimation step is included in the framework before the parameter prediction block to handle the drawbacks of packet drops or sensor failures and reconstruct the time series of vehicle parameters at a desirable frequency.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
Deep Sketch-Photo Face Recognition Assisted by Facial Attributes
Authors:
Seyed Mehdi Iranmanesh,
Hadi Kazemi,
Sobhan Soleymani,
Ali Dabouei,
Nasser M. Nasrabadi
Abstract:
In this paper, we present a deep coupled framework to address the problem of matching sketch image against a gallery of mugshots. Face sketches have the essential in- formation about the spatial topology and geometric details of faces while missing some important facial attributes such as ethnicity, hair, eye, and skin color. We propose a cou- pled deep neural network architecture which utilizes f…
▽ More
In this paper, we present a deep coupled framework to address the problem of matching sketch image against a gallery of mugshots. Face sketches have the essential in- formation about the spatial topology and geometric details of faces while missing some important facial attributes such as ethnicity, hair, eye, and skin color. We propose a cou- pled deep neural network architecture which utilizes facial attributes in order to improve the sketch-photo recognition performance. The proposed Attribute-Assisted Deep Con- volutional Neural Network (AADCNN) method exploits the facial attributes and leverages the loss functions from the facial attributes identification and face verification tasks in order to learn rich discriminative features in a common em- bedding subspace. The facial attribute identification task increases the inter-personal variations by pushing apart the embedded features extracted from individuals with differ- ent facial attributes, while the verification task reduces the intra-personal variations by pulling together all the fea- tures that are related to one person. The learned discrim- inative features can be well generalized to new identities not seen in the training data. The proposed architecture is able to make full use of the sketch and complementary fa- cial attribute information to train a deep model compared to the conventional sketch-photo recognition methods. Exten- sive experiments are performed on composite (E-PRIP) and semi-forensic (IIIT-D semi-forensic) datasets. The results show the superiority of our method compared to the state- of-the-art models in sketch-photo recognition algorithms
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
ID Preserving Generative Adversarial Network for Partial Latent Fingerprint Reconstruction
Authors:
Ali Dabouei,
Sobhan Soleymani,
Hadi Kazemi,
Seyed Mehdi Iranmanesh,
Jeremy Dawson,
Nasser M. Nasrabadi
Abstract:
Performing recognition tasks using latent fingerprint samples is often challenging for automated identification systems due to poor quality, distortion, and partially missing information from the input samples. We propose a direct latent fingerprint reconstruction model based on conditional generative adversarial networks (cGANs). Two modifications are applied to the cGAN to adapt it for the task…
▽ More
Performing recognition tasks using latent fingerprint samples is often challenging for automated identification systems due to poor quality, distortion, and partially missing information from the input samples. We propose a direct latent fingerprint reconstruction model based on conditional generative adversarial networks (cGANs). Two modifications are applied to the cGAN to adapt it for the task of latent fingerprint reconstruction. First, the model is forced to generate three additional maps to the ridge map to ensure that the orientation and frequency information is considered in the generation process, and prevent the model from filling large missing areas and generating erroneous minutiae. Second, a perceptual ID preservation approach is developed to force the generator to preserve the ID information during the reconstruction process. Using a synthetically generated database of latent fingerprints, the deep network learns to predict missing information from the input latent samples. We evaluate the proposed method in combination with two different fingerprint matching algorithms on several publicly available latent fingerprint datasets. We achieved the rank-10 accuracy of 88.02\% on the IIIT-Delhi latent fingerprint database for the task of latent-to-latent matching and rank-50 accuracy of 70.89\% on the IIIT-Delhi MOLF database for the task of latent-to-sensor matching. Experimental results of matching reconstructed samples in both latent-to-sensor and latent-to-latent frameworks indicate that the proposed method significantly increases the matching accuracy of the fingerprint recognition systems for the latent samples.
△ Less
Submitted 31 July, 2018;
originally announced August 2018.
-
Multi-Level Feature Abstraction from Convolutional Neural Networks for Multimodal Biometric Identification
Authors:
Sobhan Soleymani,
Ali Dabouei,
Hadi Kazemi,
Jeremy Dawson,
Nasser M. Nasrabadi
Abstract:
In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different c…
▽ More
In this paper, we propose a deep multimodal fusion network to fuse multiple modalities (face, iris, and fingerprint) for person identification. The proposed deep multimodal fusion algorithm consists of multiple streams of modality-specific Convolutional Neural Networks (CNNs), which are jointly optimized at multiple feature abstraction levels. Multiple features are extracted at several different convolutional layers from each modality-specific CNN for joint feature fusion, optimization, and classification. Features extracted at different convolutional layers of a modality-specific CNN represent the input at several different levels of abstract representations. We demonstrate that an efficient multimodal classification can be accomplished with a significant reduction in the number of network parameters by exploiting these multi-level abstract representations extracted from all the modality-specific CNNs. We demonstrate an increase in multimodal person identification performance by utilizing the proposed multi-level feature abstract representations in our multimodal fusion, rather than using only the features from the last layer of each modality-specific CNNs. We show that our deep multi-modal CNNs with multimodal fusion at several different feature level abstraction can significantly outperform the unimodal representation accuracy. We also demonstrate that the joint optimization of all the modality-specific CNNs excels the score and decision level fusions of independently optimized CNNs.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
Attribute-Centered Loss for Soft-Biometrics Guided Face Sketch-Photo Recognition
Authors:
Hadi Kazemi,
Sobhan Soleymani,
Ali Dabouei,
Mehdi Iranmanesh,
Nasser M. Nasrabadi
Abstract:
Face sketches are able to capture the spatial topology of a face while lacking some facial attributes such as race, skin, or hair color. Existing sketch-photo recognition approaches have mostly ignored the importance of facial attributes. In this paper, we propose a new loss function, called attribute-centered loss, to train a Deep Coupled Convolutional Neural Network (DCCNN) for the facial attrib…
▽ More
Face sketches are able to capture the spatial topology of a face while lacking some facial attributes such as race, skin, or hair color. Existing sketch-photo recognition approaches have mostly ignored the importance of facial attributes. In this paper, we propose a new loss function, called attribute-centered loss, to train a Deep Coupled Convolutional Neural Network (DCCNN) for the facial attribute guided sketch to photo matching. Specifically, an attribute-centered loss is proposed which learns several distinct centers, in a shared embedding space, for photos and sketches with different combinations of attributes. The DCCNN simultaneously is trained to map photos and pairs of testified attributes and corresponding forensic sketches around their associated centers, while preserving the spatial topology information. Importantly, the centers learn to keep a relative distance from each other, related to their number of contradictory attributes. Extensive experiments are performed on composite (E-PRIP) and semi-forensic (IIIT-D Semi-forensic) databases. The proposed method significantly outperforms the state-of-the-art.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
Deep Cross Polarimetric Thermal-to-visible Face Recognition
Authors:
Seyed Mehdi Iranmanesh,
Ali Dabouei,
Hadi Kazemi,
Nasser M. Nasrabadi
Abstract:
In this paper, we present a deep coupled learning frame- work to address the problem of matching polarimetric ther- mal face photos against a gallery of visible faces. Polariza- tion state information of thermal faces provides the miss- ing textural and geometrics details in the thermal face im- agery which exist in visible spectrum. we propose a coupled deep neural network architecture which leve…
▽ More
In this paper, we present a deep coupled learning frame- work to address the problem of matching polarimetric ther- mal face photos against a gallery of visible faces. Polariza- tion state information of thermal faces provides the miss- ing textural and geometrics details in the thermal face im- agery which exist in visible spectrum. we propose a coupled deep neural network architecture which leverages relatively large visible and thermal datasets to overcome the problem of overfitting and eventually we train it by a polarimetric thermal face dataset which is the first of its kind. The pro- posed architecture is able to make full use of the polari- metric thermal information to train a deep model compared to the conventional shallow thermal-to-visible face recogni- tion methods. Proposed coupled deep neural network also finds global discriminative features in a nonlinear embed- ding space to relate the polarimetric thermal faces to their corresponding visible faces. The results show the superior- ity of our method compared to the state-of-the-art models in cross thermal-to-visible face recognition algorithms.
△ Less
Submitted 4 January, 2018;
originally announced January 2018.
-
Fingerprint Distortion Rectification using Deep Convolutional Neural Networks
Authors:
Ali Dabouei,
Hadi Kazemi,
Seyed Mehdi Iranmanesh,
Jeremi Dawson,
Nasser M. Nasrabadi
Abstract:
Elastic distortion of fingerprints has a negative effect on the performance of fingerprint recognition systems. This negative effect brings inconvenience to users in authentication applications. However, in the negative recognition scenario where users may intentionally distort their fingerprints, this can be a serious problem since distortion will prevent recognition system from identifying malic…
▽ More
Elastic distortion of fingerprints has a negative effect on the performance of fingerprint recognition systems. This negative effect brings inconvenience to users in authentication applications. However, in the negative recognition scenario where users may intentionally distort their fingerprints, this can be a serious problem since distortion will prevent recognition system from identifying malicious users. Current methods aimed at addressing this problem still have limitations. They are often not accurate because they estimate distortion parameters based on the ridge frequency map and orientation map of input samples, which are not reliable due to distortion. Secondly, they are not efficient and requiring significant computation time to rectify samples. In this paper, we develop a rectification model based on a Deep Convolutional Neural Network (DCNN) to accurately estimate distortion parameters from the input image. Using a comprehensive database of synthetic distorted samples, the DCNN learns to accurately estimate distortion bases ten times faster than the dictionary search methods used in the previous approaches. Evaluating the proposed method on public databases of distorted samples shows that it can significantly improve the matching performance of distorted samples.
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
Polar Coding for Achieving the Capacity of Marginal Channels in Nonbinary-Input Setting
Authors:
Amirsina Torfi,
Sobhan Soleymani,
Seyed Mehdi Iranmanesh,
Hadi Kazemi,
Rouzbeh Asghari Shirvani,
Vahid Tabataba Vakili
Abstract:
Achieving information-theoretic security using explicit coding scheme in which unlimited computational power for eavesdropper is assumed, is one of the main topics is security consideration. It is shown that polar codes are capacity achieving codes and have a low complexity in encoding and decoding. It has been proven that polar codes reach to secrecy capacity in the binary-input wiretap channels…
▽ More
Achieving information-theoretic security using explicit coding scheme in which unlimited computational power for eavesdropper is assumed, is one of the main topics is security consideration. It is shown that polar codes are capacity achieving codes and have a low complexity in encoding and decoding. It has been proven that polar codes reach to secrecy capacity in the binary-input wiretap channels in symmetric settings for which the wiretapper's channel is degraded with respect to the main channel. The first task of this paper is to propose a coding scheme to achieve secrecy capacity in asymmetric nonbinary-input channels while kee** reliability and security conditions satisfied. Our assumption is that the wiretap channel is stochastically degraded with respect to the main channel and message distribution is unspecified. The main idea is to send information set over good channels for Bob and bad channels for Eve and send random symbols for channels that are good for both. In this scheme the frozen vector is defined over all possible choices using polar codes ensemble concept. We proved that there exists a frozen vector for which the coding scheme satisfies reliability and security conditions. It is further shown that uniform distribution of the message is the necessary condition for achieving secrecy capacity.
△ Less
Submitted 6 February, 2017; v1 submitted 20 January, 2017;
originally announced January 2017.