-
Enhancing Speech-Driven 3D Facial Animation with Audio-Visual Guidance from Lip Reading Expert
Authors:
Han EunGi,
Oh Hyun-Bin,
Kim Sung-Bin,
Corentin Nivelet Etcheberry,
Suekyeong Nam,
Janghoon Joo,
Tae-Hyun Oh
Abstract:
Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial expressions. In this paper, we introduce a method for speech-driven 3D facial animation to generate accurate lip movements, proposing an audio-visual multimodal pe…
▽ More
Speech-driven 3D facial animation has recently garnered attention due to its cost-effective usability in multimedia production. However, most current advances overlook the intelligibility of lip movements, limiting the realism of facial expressions. In this paper, we introduce a method for speech-driven 3D facial animation to generate accurate lip movements, proposing an audio-visual multimodal perceptual loss. This loss provides guidance to train the speech-driven 3D facial animators to generate plausible lip motions aligned with the spoken transcripts. Furthermore, to incorporate the proposed audio-visual perceptual loss, we devise an audio-visual lip reading expert leveraging its prior knowledge about correlations between speech and lip motions. We validate the effectiveness of our approach through broad experiments, showing noticeable improvements in lip synchronization and lip readability performance. Codes are available at https://3d-talking-head-avguide.github.io/.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset
Authors:
Kim Sung-Bin,
Lee Chae-Yeon,
Gihun Son,
Oh Hyun-Bin,
Janghoon Ju,
Suekyeong Nam,
Tae-Hyun Oh
Abstract:
Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of…
▽ More
Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of diverse languages. We collect a new multilingual 2D video dataset comprising over 420 hours of talking videos in 20 languages. With our proposed dataset, we present a multilingually enhanced model that incorporates language-specific style embeddings, enabling it to capture the unique mouth movements associated with each language. Additionally, we present a metric for assessing lip-sync accuracy in multilingual settings. We demonstrate that training a 3D talking head model with our proposed dataset significantly enhances its multilingual performance. Codes and datasets are available at https://multi-talk.github.io/.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Freq-Mip-AA : Frequency Mip Representation for Anti-Aliasing Neural Radiance Fields
Authors:
Youngin Park,
Seungtae Nam,
Cheul-hee Hahm,
Eunbyung Park
Abstract:
Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While…
▽ More
Neural Radiance Fields (NeRF) have shown remarkable success in representing 3D scenes and generating novel views. However, they often struggle with aliasing artifacts, especially when rendering images from different camera distances from the training views. To address the issue, Mip-NeRF proposed using volumetric frustums to render a pixel and suggested integrated positional encoding (IPE). While effective, this approach requires long training times due to its reliance on MLP architecture. In this work, we propose a novel anti-aliasing technique that utilizes grid-based representations, usually showing significantly faster training time. In addition, we exploit frequency-domain representation to handle the aliasing problem inspired by the sampling theorem. The proposed method, FreqMipAA, utilizes scale-specific low-pass filtering (LPF) and learnable frequency masks. Scale-specific low-pass filters (LPF) prevent aliasing and prioritize important image details, and learnable masks effectively remove problematic high-frequency elements while retaining essential information. By employing a scale-specific LPF and trainable masks, FreqMipAA can effectively eliminate the aliasing factor while retaining important details. We validated the proposed technique by incorporating it into a widely used grid-based method. The experimental results have shown that the FreqMipAA effectively resolved the aliasing issues and achieved state-of-the-art results in the multi-scale Blender dataset. Our code is available at https://github.com/yi0109/FreqMipAA .
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Meent: Differentiable Electromagnetic Simulator for Machine Learning
Authors:
Yongha Kim,
Anthony W. Jung,
Sanmun Kim,
Kevin Octavian,
Doyoung Heo,
Chae** Park,
Jeongmin Shin,
Sunghyun Nam,
Chanhyung Park,
Juho Park,
Sangjun Han,
**myoung Lee,
Seolho Kim,
Min Seok Jang,
Chan Y. Park
Abstract:
Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reachin…
▽ More
Electromagnetic (EM) simulation plays a crucial role in analyzing and designing devices with sub-wavelength scale structures such as solar cells, semiconductor devices, image sensors, future displays and integrated photonic devices. Specifically, optics problems such as estimating semiconductor device structures and designing nanophotonic devices provide intriguing research topics with far-reaching real world impact. Traditional algorithms for such tasks require iteratively refining parameters through simulations, which often yield sub-optimal results due to the high computational cost of both the algorithms and EM simulations. Machine learning (ML) emerged as a promising candidate to mitigate these challenges, and optics research community has increasingly adopted ML algorithms to obtain results surpassing classical methods across various tasks. To foster a synergistic collaboration between the optics and ML communities, it is essential to have an EM simulation software that is user-friendly for both research communities. To this end, we present Meent, an EM simulation software that employs rigorous coupled-wave analysis (RCWA). Developed in Python and equipped with automatic differentiation (AD) capabilities, Meent serves as a versatile platform for integrating ML into optics research and vice versa. To demonstrate its utility as a research platform, we present three applications of Meent: 1) generating a dataset for training neural operator, 2) serving as an environment for the reinforcement learning of nanophotonic device optimization, and 3) providing a solution for inverse problems with gradient-based optimizers. These applications highlight Meent's potential to advance both EM simulation and ML methodologies. The code is available at https://github.com/kc-ml2/meent with the MIT license to promote the cross-polinations of ideas among academic researchers and industry practitioners.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Holographic Parallax Improves 3D Perceptual Realism
Authors:
Dongyeon Kim,
Seung-Woo Nam,
Suyeon Choi,
Jong-Mo Seo,
Gordon Wetzstein,
Yoonchan Jeong
Abstract:
Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what…
▽ More
Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what the perceptual implications are of the choice of algorithm and target content type. In this work, we build a perceptual testbed of a full-color, high-quality holographic near-eye display. Under natural viewing conditions, we examine the effects of various CGH supervision formats and conduct user studies to assess their perceptual impacts on 3D realism. Our results indicate that CGH algorithms designed for specific viewpoints exhibit noticeable deficiencies in achieving 3D realism. In contrast, holograms incorporating parallax cues consistently outperform other formats across different viewing conditions, including the center of the eyebox. This finding is particularly interesting and suggests that the inclusion of parallax cues in CGH rendering plays a crucial role in enhancing the overall quality of the holographic experience. This work represents an initial stride towards delivering a perceptually realistic 3D experience with holographic near-eye displays.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Real, fake and synthetic faces -- does the coin have three sides?
Authors:
Shahzeb Naeem,
Ramzi Al-Sharawi,
Muhammad Riyyan Khan,
Usman Tariq,
Abhinav Dhall,
Hasan Al-Nashash
Abstract:
With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two pa…
▽ More
With the ever-growing power of generative artificial intelligence, deepfake and artificially generated (synthetic) media have continued to spread online, which creates various ethical and moral concerns regarding their usage. To tackle this, we thus present a novel exploration of the trends and patterns observed in real, deepfake and synthetic facial images. The proposed analysis is done in two parts: firstly, we incorporate eight deep learning models and analyze their performances in distinguishing between the three classes of images. Next, we look to further delve into the similarities and differences between these three sets of images by investigating their image properties both in the context of the entire image as well as in the context of specific regions within the image. ANOVA test was also performed and provided further clarity amongst the patterns associated between the images of the three classes. From our findings, we observe that the investigated deeplearning models found it easier to detect synthetic facial images, with the ViT Patch-16 model performing best on this task with a class-averaged sensitivity, specificity, precision, and accuracy of 97.37%, 98.69%, 97.48%, and 98.25%, respectively. This observation was supported by further analysis of various image properties. We saw noticeable differences across the three category of images. This analysis can help us build better algorithms for facial image generation, and also shows that synthetic, deepfake and real face images are indeed three different classes.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Unleash the Potential of CLIP for Video Highlight Detection
Authors:
Donghoon Han,
Seunghyeon Seo,
Eunhwan Park,
Seong-Uk Nam,
Nojun Kwak
Abstract:
Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train…
▽ More
Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Generation and Detection of Sign Language Deepfakes -- A Linguistic and Visual Analysis
Authors:
Shahzeb Naeem,
Muhammad Riyyan Khan,
Usman Tariq,
Abhinav Dhall,
Carlos Ivan Colon,
Hasan Al-Nashash
Abstract:
A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign lan…
▽ More
A question in the realm of deepfakes is slowly emerging pertaining to whether we can go beyond facial deepfakes and whether it would be beneficial to society. Therefore, this research presents a positive application of deepfake technology in upper body generation, while performing sign-language for the Deaf and Hard of Hearing (DHoH) community. The resulting videos are later vetted with a sign language expert. This is particularly helpful, given the intricate nature of sign language, a scarcity of sign language experts, and potential benefits for health and education. The objectives of this work encompass constructing a reliable deepfake dataset, evaluating its technical and visual credibility through computer vision and natural language processing models, and assessing the plausibility of the generated content. With over 1200 videos, featuring both previously seen and unseen individuals for the generation model, using the help of a sign language expert, we establish a deepfake dataset in sign language that can further be utilized to detect fake videos that may target certain people of determination.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Imperceptible Protection against Style Imitation from Diffusion Models
Authors:
Namhyuk Ahn,
Wonhyuk Ahn,
KiYoon Yoo,
Daesik Kim,
Seung-Hun Nam
Abstract:
Recent progress in diffusion models has profoundly enhanced the fidelity of image generation. However, this has raised concerns about copyright infringements. While prior methods have introduced adversarial perturbations to prevent style imitation, most are accompanied by the degradation of artworks' visual quality. Recognizing the importance of maintaining this, we develop a visually improved pro…
▽ More
Recent progress in diffusion models has profoundly enhanced the fidelity of image generation. However, this has raised concerns about copyright infringements. While prior methods have introduced adversarial perturbations to prevent style imitation, most are accompanied by the degradation of artworks' visual quality. Recognizing the importance of maintaining this, we develop a visually improved protection method that preserves its protection capability. To this end, we create a perceptual map to identify areas most sensitive to human eyes. We then adjust the protection intensity guided by an instance-aware refinement. We also integrate a perceptual constraints bank to further improve the imperceptibility. Results show that our method substantially elevates the quality of the protected image without compromising on protection efficacy.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
A Framework for Portrait Stylization with Skin-Tone Awareness and Nudity Identification
Authors:
Seungkwon Kim,
Sangyeon Kim,
Seung-Hun Nam
Abstract:
Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinc…
▽ More
Portrait stylization is a challenging task involving the transformation of an input portrait image into a specific style while preserving its inherent characteristics. The recent introduction of Stable Diffusion (SD) has significantly improved the quality of outcomes in this field. However, a practical stylization framework that can effectively filter harmful input content and preserve the distinct characteristics of an input, such as skin-tone, while maintaining the quality of stylization remains lacking. These challenges have hindered the wide deployment of such a framework. To address these issues, this study proposes a portrait stylization framework that incorporates a nudity content identification module (NCIM) and a skin-tone-aware portrait stylization module (STAPSM). In experiments, NCIM showed good performance in enhancing explicit content filtering, and STAPSM accurately represented a diverse range of skin tones. Our proposed framework has been successfully deployed in practice, and it has effectively satisfied critical requirements of real-world applications.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
Mip-Grid: Anti-aliased Grid Representations for Neural Radiance Fields
Authors:
Seungtae Nam,
Daniel Rho,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP ar…
▽ More
Despite the remarkable achievements of neural radiance fields (NeRF) in representing 3D scenes and generating novel view images, the aliasing issue, rendering "jaggies" or "blurry" images at varying camera distances, remains unresolved in most existing approaches. The recently proposed mip-NeRF has addressed this challenge by rendering conical frustums instead of rays. However, it relies on MLP architecture to represent the radiance fields, missing out on the fast training speed offered by the latest grid-based methods. In this work, we present mip-Grid, a novel approach that integrates anti-aliasing techniques into grid-based representations for radiance fields, mitigating the aliasing artifacts while enjoying fast training time. The proposed method generates multi-scale grids by applying simple convolution operations over a shared grid representation and uses the scale-aware coordinate to retrieve features at different scales from the generated multi-scale grids. To test the effectiveness, we integrated the proposed method into the two recent representative grid-based methods, TensoRF and K-Planes. Experimental results demonstrate that mip-Grid greatly improves the rendering performance of both methods and even outperforms mip-NeRF on multi-scale datasets while achieving significantly faster training time. For code and demo videos, please see https://stnamjef.github.io/mipgrid.github.io/.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Authors:
Gui** Son,
Sangwon Baek,
Sangdae Nam,
Ilgyun Jeong,
Seungone Kim
Abstract:
Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25…
▽ More
Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task Inference on the MTI Bench. We release the MTI Bench dataset and our code at this link https://github.com/gui**SON/MTI-Bench.
△ Less
Submitted 6 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Geometry Transfer for Stylizing Radiance Fields
Authors:
Hyunyoung Jung,
Seonghyeon Nam,
Nikolaos Sarafianos,
Sungjoo Yoo,
Alexander Sorkine-Hornung,
Rakesh Ranjan
Abstract:
Shape and geometric patterns are essential in defining stylistic identity. However, current 3D style transfer methods predominantly focus on transferring colors and textures, often overlooking geometric aspects. In this paper, we introduce Geometry Transfer, a novel method that leverages geometric deformation for 3D style transfer. This technique employs depth maps to extract a style guide, subseq…
▽ More
Shape and geometric patterns are essential in defining stylistic identity. However, current 3D style transfer methods predominantly focus on transferring colors and textures, often overlooking geometric aspects. In this paper, we introduce Geometry Transfer, a novel method that leverages geometric deformation for 3D style transfer. This technique employs depth maps to extract a style guide, subsequently applied to stylize the geometry of radiance fields. Moreover, we propose new techniques that utilize geometric cues from the 3D scene, thereby enhancing aesthetic expressiveness and more accurately reflecting intended styles. Our extensive experiments show that Geometry Transfer enables a broader and more expressive range of stylizations, thereby significantly expanding the scope of 3D style transfer.
△ Less
Submitted 6 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
Multi-Robot Relative Pose Estimation in SE(2) with Observability Analysis: A Comparison of Extended Kalman Filtering and Robust Pose Graph Optimization
Authors:
Kihoon Shin,
Hyunjae Sim,
Seungwon Nam,
Yonghee Kim,
Jae Hu,
Kwang-Ki K. Kim
Abstract:
In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose es…
▽ More
In this study, we address multi-robot localization issues, with a specific focus on cooperative localization and observability analysis of relative pose estimation. Cooperative localization involves enhancing each robot's information through a communication network and message passing. If odometry data from a target robot can be transmitted to the ego robot, observability of their relative pose estimation can be achieved through range-only or bearing-only measurements, provided both robots have non-zero linear velocities. In cases where odometry data from a target robot are not directly transmitted but estimated by the ego robot, both range and bearing measurements are necessary to ensure observability of relative pose estimation. For ROS/Gazebo simulations, we explore four sensing and communication structures. We compare extended Kalman filtering (EKF) and pose graph optimization (PGO) estimation using different robust loss functions (filtering and smoothing with varying batch sizes of sliding windows) in terms of estimation accuracy. In hardware experiments, two Turtlebot3 equipped with UWB modules are used for real-world inter-robot relative pose estimation, applying both EKF and PGO and comparing their performance.
△ Less
Submitted 4 February, 2024; v1 submitted 27 January, 2024;
originally announced January 2024.
-
Integrating Open-World Shared Control in Immersive Avatars
Authors:
Patrick Naughton,
James Seungbum Nam,
Andrew Stratton,
Kris Hauser
Abstract:
Teleoperated avatar robots allow people to transport their manipulation skills to environments that may be difficult or dangerous to work in. Current systems are able to give operators direct control of many components of the robot to immerse them in the remote environment, but operators still struggle to complete tasks as competently as they could in person. We present a framework for incorporati…
▽ More
Teleoperated avatar robots allow people to transport their manipulation skills to environments that may be difficult or dangerous to work in. Current systems are able to give operators direct control of many components of the robot to immerse them in the remote environment, but operators still struggle to complete tasks as competently as they could in person. We present a framework for incorporating open-world shared control into avatar robots to combine the benefits of direct and shared control. This framework preserves the fluency of our avatar interface by minimizing obstructions to the operator's view and using the same interface for direct, shared, and fully autonomous control. In a human subjects study (N=19), we find that operators using this framework complete a range of tasks significantly more quickly and reliably than those that do not.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
Coordinate-Aware Modulation for Neural Fields
Authors:
Joo Chan Lee,
Daniel Rho,
Seungtae Nam,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural fields, map** low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other h…
▽ More
Neural fields, map** low-dimensional input coordinates to corresponding signals, have shown promising results in representing various signals. Numerous methodologies have been proposed, and techniques employing MLPs and grid representations have achieved substantial success. MLPs allow compact and high expressibility, yet often suffer from spectral bias and slow convergence speed. On the other hand, methods using grids are free from spectral bias and achieve fast training speed, however, at the expense of high spatial complexity. In this work, we propose a novel way for exploiting both MLPs and grid representations in neural fields. Unlike the prevalent methods that combine them sequentially (extract features from the grids first and feed them to the MLP), we inject spectral bias-free grid representations into the intermediate features in the MLP. More specifically, we suggest a Coordinate-Aware Modulation (CAM), which modulates the intermediate features using scale and shift parameters extracted from the grid representations. This can maintain the strengths of MLPs while mitigating any remaining potential biases, facilitating the rapid learning of high-frequency components. In addition, we empirically found that the feature normalizations, which have not been successful in neural filed literature, proved to be effective when applied in conjunction with the proposed CAM. Experimental results demonstrate that CAM enhances the performance of neural representation and improves learning stability across a range of signals. Especially in the novel view synthesis task, we achieved state-of-the-art performance with the least number of parameters and fast training speed for dynamic scenes and the best performance under 1MB memory for static scenes. CAM also outperforms the best-performing video compression methods using neural fields by a large margin.
△ Less
Submitted 25 November, 2023;
originally announced November 2023.
-
Programmable Superconducting Optoelectronic Single-Photon Synapses with Integrated Multi-State Memory
Authors:
Bryce A. Primavera,
Saeed Khan,
Richard P. Mirin,
Sae Woo Nam,
Jeffrey M. Shainline
Abstract:
The co-location of memory and processing is a core principle of neuromorphic computing. A local memory device for synaptic weight storage has long been recognized as an enabling element for large-scale, high-performance neuromorphic hardware. In this work, we demonstrate programmable superconducting synapses with integrated memories for use in superconducting optoelectronic neural systems. Superco…
▽ More
The co-location of memory and processing is a core principle of neuromorphic computing. A local memory device for synaptic weight storage has long been recognized as an enabling element for large-scale, high-performance neuromorphic hardware. In this work, we demonstrate programmable superconducting synapses with integrated memories for use in superconducting optoelectronic neural systems. Superconducting nanowire single-photon detectors and Josephson junctions are combined into programmable synaptic circuits that exhibit single-photon sensitivity, memory cells with more than 400 internal states, leaky integration of input spike events, and 0.4 fJ programming energies (including cooling power). These results are attractive for implementing a variety of supervised and unsupervised learning algorithms and lay the foundation for a new hardware platform optimized for large-scale spiking network accelerators.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Authors:
Kim Sung-Bin,
Lee Hyun,
Da Hye Hong,
Suekyeong Nam,
Janghoon Ju,
Tae-Hyun Oh
Abstract:
Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate…
▽ More
Laughter is a unique expression, essential to affirmative social interactions of humans. Although current 3D talking head generation methods produce convincing verbal articulations, they often fail to capture the vitality and subtleties of laughter and smiles despite their importance in social context. In this paper, we introduce a novel task to generate 3D talking heads capable of both articulate speech and authentic laughter. Our newly curated dataset comprises 2D laughing videos paired with pseudo-annotated and human-validated 3D FLAME parameters and vertices. Given our proposed dataset, we present a strong baseline with a two-stage training scheme: the model first learns to talk and then acquires the ability to express laughter. Extensive experiments demonstrate that our method performs favorably compared to existing approaches in both talking head generation and expressing laughter signals. We further explore potential applications on top of our proposed method for rigging realistic avatars.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Optimal Private Discrete Distribution Estimation with One-bit Communication
Authors:
Seung-Hyun Nam,
Vincent Y. F. Tan,
Si-Hyeon Lee
Abstract:
We consider a private discrete distribution estimation problem with one-bit communication constraint. The privacy constraints are imposed with respect to the local differential privacy and the maximal leakage. The estimation error is quantified by the worst-case mean squared error. We completely characterize the first-order asymptotics of this privacy-utility trade-off under the one-bit communicat…
▽ More
We consider a private discrete distribution estimation problem with one-bit communication constraint. The privacy constraints are imposed with respect to the local differential privacy and the maximal leakage. The estimation error is quantified by the worst-case mean squared error. We completely characterize the first-order asymptotics of this privacy-utility trade-off under the one-bit communication constraint for both types of privacy constraints by using ideas from local asymptotic normality and the resolution of a block design mechanism. These results demonstrate the optimal dependence of the privacy-utility trade-off under the one-bit communication constraint in terms of the parameters of the privacy constraint and the size of the alphabet of the discrete distribution.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Authors:
Kim Youwang,
Lee Hyun,
Kim Sung-Bin,
Suekyeong Nam,
Janghoon Ju,
Tae-Hyun Oh
Abstract:
We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFa…
▽ More
We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.
△ Less
Submitted 6 October, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Depolarized Holography with Polarization-multiplexing Metasurface
Authors:
Seung-Woo Nam,
Young** Kim,
Dongyeon Kim,
Yoonchan Jeong
Abstract:
The evolution of computer-generated holography (CGH) algorithms has prompted significant improvements in the performances of holographic displays. Nonetheless, they start to encounter a limited degree of freedom in CGH optimization and physical constraints stemming from the coherent nature of holograms. To surpass the physical limitations, we consider polarization as a new degree of freedom by uti…
▽ More
The evolution of computer-generated holography (CGH) algorithms has prompted significant improvements in the performances of holographic displays. Nonetheless, they start to encounter a limited degree of freedom in CGH optimization and physical constraints stemming from the coherent nature of holograms. To surpass the physical limitations, we consider polarization as a new degree of freedom by utilizing a novel optical platform called metasurface. Polarization-multiplexing metasurfaces enable incoherent-like behavior in holographic displays due to the mutual incoherence of orthogonal polarization states. We leverage this unique characteristic of a metasurface by integrating it into a holographic display and exploiting polarization diversity to bring an additional degree of freedom for CGH algorithms. To minimize the speckle noise while maximizing the image quality, we devise a fully differentiable optimization pipeline by taking into account the metasurface proxy model, thereby jointly optimizing spatial light modulator phase patterns and geometric parameters of metasurface nanostructures. We evaluate the metasurface-enabled depolarized holography through simulations and experiments, demonstrating its ability to reduce speckle noise and enhance image quality.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Authors:
Namhyuk Ahn,
Junsoo Lee,
Chunggi Lee,
Kunhee Kim,
Daesik Kim,
Seung-Hun Nam,
Kibeom Hong
Abstract:
Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyler, a novel framewor…
▽ More
Recent progresses in large-scale text-to-image models have yielded remarkable accomplishments, finding various applications in art domain. However, expressing unique characteristics of an artwork (e.g. brushwork, colortone, or composition) with text prompts alone may encounter limitations due to the inherent constraints of verbal description. To this end, we introduce DreamStyler, a novel framework designed for artistic image synthesis, proficient in both text-to-image synthesis and style transfer. DreamStyler optimizes a multi-stage textual embedding with a context-aware text prompt, resulting in prominent image quality. In addition, with content and style guidance, DreamStyler exhibits flexibility to accommodate a range of style references. Experimental results demonstrate its superior performance across multiple scenarios, suggesting its promising potential in artistic product creation.
△ Less
Submitted 18 December, 2023; v1 submitted 13 September, 2023;
originally announced September 2023.
-
Achieving the Exactly Optimal Privacy-Utility Trade-Off with Low Communication Cost via Shared Randomness
Authors:
Seung-Hyun Nam,
Hyun-Young Park,
Si-Hyeon Lee
Abstract:
We consider a discrete distribution estimation problem under a local differential privacy (LDP) constraint in the presence of shared randomness. By exploiting the shared randomness, we suggest a new method for constructing LDP schemes which achieve the exactly optimal privacy-utility trade-off (PUT) with the communication cost of less than or equal to the input data size for any privacy regime. Th…
▽ More
We consider a discrete distribution estimation problem under a local differential privacy (LDP) constraint in the presence of shared randomness. By exploiting the shared randomness, we suggest a new method for constructing LDP schemes which achieve the exactly optimal privacy-utility trade-off (PUT) with the communication cost of less than or equal to the input data size for any privacy regime. The main idea is to decompose a block design scheme by Park et al. (2023), based on the combinatorial concept called resolution. The LDP scheme decomposed from a block design scheme is called a resolution of the block design scheme, and it achieves the same PUT as the original block design scheme while requiring a less communication cost. We provide two resolutions of an exactly PUT-optimal block design scheme, called the Baranyai's resolution and the cyclic shift resolution, both requiring the communication cost of less than or equal to the input data size. In particular, we show that the Baranyai's resolution achieves the minimum communication cost among all the PUT-optimal resolutions of block design schemes. One drawback of the Baranyai's resolution is that it can be obtained through a recursive algorithm in general. In contrast, the cyclic shift resolution has an explicit structure, but its communication cost can be larger than that of Baranyai's resolution. To complement this, we also suggest resolutions of other block design schemes achieving the optimal PUT for some privacy budgets, which require the minimum communication cost as the Baranyai's resolution and have explicit structures as the cyclic shift resolution.
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
Separable Physics-Informed Neural Networks
Authors:
Junwoo Cho,
Seungtae Nam,
Hyunmo Yang,
Seok-Bae Yun,
Youngjoon Hong,
Eunbyung Park
Abstract:
Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but…
▽ More
Physics-informed neural networks (PINNs) have recently emerged as promising data-driven PDE solvers showing encouraging results on various PDEs. However, there is a fundamental limitation of training PINNs to solve multi-dimensional PDEs and approximate highly complex solution functions. The number of training points (collocation points) required on these challenging PDEs grows substantially, but it is severely limited due to the expensive computational costs and heavy memory overhead. To overcome this issue, we propose a network architecture and training algorithm for PINNs. The proposed method, separable PINN (SPINN), operates on a per-axis basis to significantly reduce the number of network propagations in multi-dimensional PDEs unlike point-wise processing in conventional PINNs. We also propose using forward-mode automatic differentiation to reduce the computational cost of computing PDE residuals, enabling a large number of collocation points (>10^7) on a single commodity GPU. The experimental results show drastically reduced computational costs (62x in wall-clock time, 1,394x in FLOPs given the same number of collocation points) in multi-dimensional PDEs while achieving better accuracy. Furthermore, we present that SPINN can solve a chaotic (2+1)-d Navier-Stokes equation significantly faster than the best-performing prior method (9 minutes vs 10 hours in a single GPU), maintaining accuracy. Finally, we showcase that SPINN can accurately obtain the solution of a highly nonlinear and multi-dimensional PDE, a (3+1)-d Navier-Stokes equation. For visualized results and code, please see https://jwcho5576.github.io/spinn.github.io/.
△ Less
Submitted 31 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Exactly Optimal and Communication-Efficient Private Estimation via Block Designs
Authors:
Hyun-Young Park,
Seung-Hyun Nam,
Si-Hyeon Lee
Abstract:
In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower…
▽ More
In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower communication costs. Indeed, we find many new LDP schemes that achieve the exactly optimal privacy-utility trade-off, with the minimum communication cost among all the unbiased or consistent schemes, for a certain set of input data size and LDP constraint. Furthermore, to partially solve the sparse existence issue of block design schemes, we consider a broader class of LDP schemes based on regular and pairwise-balanced designs, called RPBD schemes, which relax one of the symmetry requirements on block designs. By considering this broader class of RPBD schemes, we can find LDP schemes achieving near-optimal privacy-utility trade-off with reasonably low communication costs for a much larger set of input data size and LDP constraint.
△ Less
Submitted 17 October, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Can Voice Assistants Sound Cute? Towards a Model of Kawaii Vocalics
Authors:
Katie Seaborn,
Somang Nam,
Julia Keckeis,
Tatsuya Itagaki
Abstract:
The Japanese notion of "kawaii" or expressions of cuteness, vulnerability, and/or charm is a global cultural export. Work has explored kawaii-ness as a design feature and factor of user experience in the visual appearance, nonverbal behaviour, and sound of robots and virtual characters. In this initial work, we consider whether voices can be kawaii by exploring the vocal qualities of voice assista…
▽ More
The Japanese notion of "kawaii" or expressions of cuteness, vulnerability, and/or charm is a global cultural export. Work has explored kawaii-ness as a design feature and factor of user experience in the visual appearance, nonverbal behaviour, and sound of robots and virtual characters. In this initial work, we consider whether voices can be kawaii by exploring the vocal qualities of voice assistant speech, i.e., kawaii vocalics. Drawing from an age-inclusive model of kawaii, we ran a user perceptions study on the kawaii-ness of younger- and older-sounding Japanese computer voices. We found that kawaii-ness intersected with perceptions of gender and age, i.e., gender ambiguous and girlish, as well as VA features, i.e., fluency and artificiality. We propose an initial model of kawaii vocalics to be validated through the identification and study of vocal qualities, cognitive appraisals, behavioural responses, and affective reports.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
Authors:
Ziyu Wan,
Christian Richardt,
Aljaž Božič,
Chao Li,
Vijay Rengarajan,
Seonghyeon Nam,
Xiaoyu Xiang,
Tuotuo Li,
Bo Zhu,
Rakesh Ranjan,
**g Liao
Abstract:
Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs in…
▽ More
Neural radiance fields (NeRFs) enable novel view synthesis with unprecedented visual quality. However, to render photorealistic images, NeRFs require hundreds of deep multilayer perceptron (MLP) evaluations - for each pixel. This is prohibitively expensive and makes real-time rendering infeasible, even on powerful modern GPUs. In this paper, we propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations that are fully compatible with the massively parallel graphics rendering pipeline. We represent scenes as neural radiance features encoded on a two-layer duplex mesh, which effectively overcomes the inherent inaccuracies in 3D surface reconstruction by learning the aggregated radiance information from a reliable interval of ray-surface intersections. To exploit local geometric relationships of nearby pixels, we leverage screen-space convolutions instead of the MLPs used in NeRFs to achieve high-quality appearance. Finally, the performance of the whole framework is further boosted by a novel multi-view distillation optimization strategy. We demonstrate the effectiveness and superiority of our approach via extensive experiments on a range of standard datasets.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation
Authors:
Adam N. McCaughan,
Bakhrom G. Oripov,
Natesh Ganesh,
Sae Woo Nam,
Andrew Dienstfrey,
Sonia M. Buckley
Abstract:
We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance…
▽ More
We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance to backpropagation. Assuming realistic timescales and hardware parameters, our results indicate that these optimization techniques can train a network on emerging hardware platforms orders of magnitude faster than the wall-clock time of training via backpropagation on a standard GPU, even in the presence of imperfect weight updates or device-to-device variations in the hardware. We additionally describe how it can be applied to existing hardware as part of chip-in-the-loop training, or integrated directly at the hardware level. Crucially, the MGD framework is highly flexible, and its gradient descent process can be optimized to compensate for specific hardware limitations such as slow parameter-update speeds or limited input bandwidth.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Masked Wavelet Representation for Compact Neural Radiance Fields
Authors:
Daniel Rho,
Byeonghyeon Lee,
Seungtae Nam,
Joo Chan Lee,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using a…
▽ More
Neural radiance fields (NeRF) have demonstrated the potential of coordinate-based neural representation (neural fields or implicit neural representation) in neural rendering. However, using a multi-layer perceptron (MLP) to represent a 3D scene or object requires enormous computational resources and time. There have been recent studies on how to reduce these computational inefficiencies by using additional data structures, such as grids or trees. Despite the promising performance, the explicit data structure necessitates a substantial amount of memory. In this work, we present a method to reduce the size without compromising the advantages of having additional data structures. In detail, we propose using the wavelet transform on grid-based neural fields. Grid-based neural fields are for fast convergence, and the wavelet transform, whose efficiency has been demonstrated in high-performance standard codecs, is to improve the parameter efficiency of grids. Furthermore, in order to achieve a higher sparsity of grid coefficients while maintaining reconstruction quality, we present a novel trainable masking approach. Experimental results demonstrate that non-spatial grid coefficients, such as wavelet coefficients, are capable of attaining a higher level of sparsity than spatial grid coefficients, resulting in a more compact representation. With our proposed mask and compression pipeline, we achieved state-of-the-art performance within a memory budget of 2 MB. Our code is available at https://github.com/daniel03c1/masked_wavelet_nerf.
△ Less
Submitted 21 March, 2023; v1 submitted 18 December, 2022;
originally announced December 2022.
-
FSID: Fully Synthetic Image Denoising via Procedural Scene Generation
Authors:
Gyeongmin Choe,
Beibei Du,
Seonghyeon Nam,
Xiaoyu Xiang,
Bo Zhu,
Rakesh Ranjan
Abstract:
For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset…
▽ More
For low-level computer vision and image processing ML tasks, training on large datasets is critical for generalization. However, the standard practice of relying on real-world images primarily from the Internet comes with image quality, scalability, and privacy issues, especially in commercial contexts. To address this, we have developed a procedural synthetic data generation pipeline and dataset tailored to low-level vision tasks. Our Unreal engine-based synthetic data pipeline populates large scenes algorithmically with a combination of random 3D objects, materials, and geometric transformations. Then, we calibrate the camera noise profiles to synthesize the noisy images. From this pipeline, we generated a fully synthetic image denoising dataset (FSID) which consists of 175,000 noisy/clean image pairs. We then trained and validated a CNN-based denoising model, and demonstrated that the model trained on this synthetic data alone can achieve competitive denoising results when evaluated on real-world noisy images captured with smartphone cameras.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Separable PINN: Mitigating the Curse of Dimensionality in Physics-Informed Neural Networks
Authors:
Junwoo Cho,
Seungtae Nam,
Hyunmo Yang,
Seok-Bae Yun,
Youngjoon Hong,
Eunbyung Park
Abstract:
Physics-informed neural networks (PINNs) have emerged as new data-driven PDE solvers for both forward and inverse problems. While promising, the expensive computational costs to obtain solutions often restrict their broader applicability. We demonstrate that the computations in automatic differentiation (AD) can be significantly reduced by leveraging forward-mode AD when training PINN. However, a…
▽ More
Physics-informed neural networks (PINNs) have emerged as new data-driven PDE solvers for both forward and inverse problems. While promising, the expensive computational costs to obtain solutions often restrict their broader applicability. We demonstrate that the computations in automatic differentiation (AD) can be significantly reduced by leveraging forward-mode AD when training PINN. However, a naive application of forward-mode AD to conventional PINNs results in higher computation, losing its practical benefit. Therefore, we propose a network architecture, called separable PINN (SPINN), which can facilitate forward-mode AD for more efficient computation. SPINN operates on a per-axis basis instead of point-wise processing in conventional PINNs, decreasing the number of network forward passes. Besides, while the computation and memory costs of standard PINNs grow exponentially along with the grid resolution, that of our model is remarkably less susceptible, mitigating the curse of dimensionality. We demonstrate the effectiveness of our model in various PDE systems by significantly reducing the training run-time while achieving comparable accuracy. Project page: https://jwcho5576.github.io/spinn/
△ Less
Submitted 2 November, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Technical Report: Development of an Ultrahigh Bandwidth Software-defined Radio Platform
Authors:
Sung Sik Nam,
Changseok Yoon,
Ki-Hong Park,
Mohamed-Slim Alouini
Abstract:
For the development of new digital signal processing systems and services, the rapid, easy, and convenient prototy** of ideas and the rapid time-to-market of products are becoming important with advances in technology. Conventionally, for the development stage, particularly when confirming the feasibility or performance of a new system or service, an idea is first confirmed through a computerbas…
▽ More
For the development of new digital signal processing systems and services, the rapid, easy, and convenient prototy** of ideas and the rapid time-to-market of products are becoming important with advances in technology. Conventionally, for the development stage, particularly when confirming the feasibility or performance of a new system or service, an idea is first confirmed through a computerbased software simulation after develo** an accurate model of the operating environment. Next, this idea is validated and tested in the real operating environment. The new systems or services and their operating environments are becoming increasingly complicated. Hence, their development processes too are more complex cost- and time-intensive tasks that require engineers with skill and professional knowledge/experience. Furthermore, for ensuring fast time-to-market, all the development processes encompassing the (i) algorithm development, (ii) product prototy**, and (iii) final product development, must be closely linked such that they can be quickly completed. In this context, the aim of this paper is to propose an ultrahigh bandwidth software-defined radio platform that can prototype a quasi-real-time operating system without a developer having sophisticated hardware/software expertise. This platform allows the realization of a software-implemented digital signal processing system in minimal time with minimal efforts and without the need of a host computer.
△ Less
Submitted 26 August, 2022; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Streamable Neural Fields
Authors:
Junwoo Cho,
Seungtae Nam,
Daniel Rho,
Jong Hwan Ko,
Eunbyung Park
Abstract:
Neural fields have emerged as a new data representation paradigm and have shown remarkable success in various signal representations. Since they preserve signals in their network parameters, the data transfer by sending and receiving the entire model parameters prevents this emerging technology from being used in many practical scenarios. We propose streamable neural fields, a single model that co…
▽ More
Neural fields have emerged as a new data representation paradigm and have shown remarkable success in various signal representations. Since they preserve signals in their network parameters, the data transfer by sending and receiving the entire model parameters prevents this emerging technology from being used in many practical scenarios. We propose streamable neural fields, a single model that consists of executable sub-networks of various widths. The proposed architectural and training techniques enable a single network to be streamable over time and reconstruct different qualities and parts of signals. For example, a smaller sub-network produces smooth and low-frequency signals, while a larger sub-network can represent fine details. Experimental results have shown the effectiveness of our method in various domains, such as 2D images, videos, and 3D signed distance functions. Finally, we demonstrate that our proposed method improves training stability, by exploiting parameter sharing.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics
Authors:
**woo Hwang,
Minsu Kim,
Daeun Kim,
Seungho Nam,
Yoonsung Kim,
Dohee Kim,
Hardik Sharma,
Jongse Park
Abstract:
Modern retrospective analytics systems leverage cascade architecture to mitigate bottleneck for computing deep neural networks (DNNs). However, the existing cascades suffer two limitations: (1) decoding bottleneck is either neglected or circumvented, paying significant compute and storage cost for pre-processing; and (2) the systems are specialized for temporal queries and lack spatial query suppo…
▽ More
Modern retrospective analytics systems leverage cascade architecture to mitigate bottleneck for computing deep neural networks (DNNs). However, the existing cascades suffer two limitations: (1) decoding bottleneck is either neglected or circumvented, paying significant compute and storage cost for pre-processing; and (2) the systems are specialized for temporal queries and lack spatial query support. This paper presents CoVA, a novel cascade architecture that splits the cascade computation between compressed domain and pixel domain to address the decoding bottleneck, supporting both temporal and spatial queries. CoVA cascades analysis into three major stages where the first two stages are performed in compressed domain while the last one in pixel domain. First, CoVA detects occurrences of moving objects (called blobs) over a set of compressed frames (called tracks). Then, using the track results, CoVA prudently selects a minimal set of frames to obtain the label information and only decode them to compute the full DNNs, alleviating the decoding bottleneck. Lastly, CoVA associates tracks with labels to produce the final analysis results on which users can process both temporal and spatial queries. Our experiments demonstrate that CoVA offers 4.8x throughput improvement over modern cascade systems, while imposing modest accuracy loss.
△ Less
Submitted 2 July, 2022;
originally announced July 2022.
-
Exploiting Transliterated Words for Finding Similarity in Inter-Language News Articles using Machine Learning
Authors:
Sameea Naeem,
Dr. Arif ur Rahman,
Syed Mujtaba Haider,
Abdul Basit Mughal
Abstract:
Finding similarities between two inter-language news articles is a challenging problem of Natural Language Processing (NLP). It is difficult to find similar news articles in a different language other than the native language of user, there is a need for a Machine Learning based automatic system to find the similarity between two inter-language news articles. In this article, we propose a Machine…
▽ More
Finding similarities between two inter-language news articles is a challenging problem of Natural Language Processing (NLP). It is difficult to find similar news articles in a different language other than the native language of user, there is a need for a Machine Learning based automatic system to find the similarity between two inter-language news articles. In this article, we propose a Machine Learning model with the combination of English Urdu word transliteration which will show whether the English news article is similar to the Urdu news article or not. The existing approaches to find similarities has a major drawback when the archives contain articles of low-resourced languages like Urdu along with English news article. The existing approaches to find similarities has drawback when the archives contain low-resourced languages like Urdu along with English news articles. We used lexicon to link Urdu and English news articles. As Urdu language processing applications like machine translation, text to speech, etc are unable to handle English text at the same time so this research proposed technique to find similarities in English and Urdu news articles based on transliteration.
△ Less
Submitted 29 May, 2022;
originally announced June 2022.
-
Feature Re-calibration based Multiple Instance Learning for Whole Slide Image Classification
Authors:
Philip Chikontwe,
Soo Jeong Nam,
Heounjeong Go,
Meejeong Kim,
Hyun Jung Sung,
Sang Hyun Park
Abstract:
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases; but, curation of accurate labels is time-consuming and limits the application of fully-supervised methods. To address this, multiple instance learning (MIL) is a popular method that poses classification as a weakly supervised learning task with slide-level labels only. While current MIL method…
▽ More
Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases; but, curation of accurate labels is time-consuming and limits the application of fully-supervised methods. To address this, multiple instance learning (MIL) is a popular method that poses classification as a weakly supervised learning task with slide-level labels only. While current MIL methods apply variants of the attention mechanism to re-weight instance features with stronger models, scant attention is paid to the properties of the data distribution. In this work, we propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature. We assume that in binary MIL, positive bags have larger feature magnitudes than negatives, thus we can enforce the model to maximize the discrepancy between bags with a metric feature loss that models positive bags as out-of-distribution. To achieve this, unlike existing MIL methods that use single-batch training modes, we propose balanced-batch sampling to effectively use the feature loss i.e., (+/-) bags simultaneously. Further, we employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder. Experimental results on existing benchmark datasets show our approach is effective and improves over state-of-the-art MIL methods.
△ Less
Submitted 21 July, 2022; v1 submitted 22 June, 2022;
originally announced June 2022.
-
Learning sRGB-to-Raw-RGB De-rendering with Content-Aware Metadata
Authors:
Seonghyeon Nam,
Abhijith Punnappurath,
Marcus A. Brubaker,
Michael S. Brown
Abstract:
Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is stil…
▽ More
Most camera images are rendered and saved in the standard RGB (sRGB) format by the camera's hardware. Due to the in-camera photo-finishing routines, nonlinear sRGB images are undesirable for computer vision tasks that assume a direct relationship between pixel values and scene radiance. For such applications, linear raw-RGB sensor images are preferred. Saving images in their raw-RGB format is still uncommon due to the large storage requirement and lack of support by many imaging applications. Several "raw reconstruction" methods have been proposed that utilize specialized metadata sampled from the raw-RGB image at capture time and embedded in the sRGB image. This metadata is used to parameterize a map** function to de-render the sRGB image back to its original raw-RGB format when needed. Existing raw reconstruction methods rely on simple sampling strategies and global map** to perform the de-rendering. This paper shows how to improve the de-rendering results by jointly learning sampling and reconstruction. Our experiments show that our learned sampling can adapt to the image content to produce better raw reconstructions than existing methods. We also describe an online fine-tuning strategy for the reconstruction network to improve results further.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
An Attention-Based Model for Predicting Contextual Informativeness and Curriculum Learning Applications
Authors:
Sung** Nam,
David Jurgens,
Gwen Frishkoff,
Kevyn Collins-Thompson
Abstract:
Both humans and machines learn the meaning of unknown words through contextual information in a sentence, but not all contexts are equally helpful for learning. We introduce an effective method for capturing the level of contextual informativeness with respect to a given target word. Our study makes three main contributions. First, we develop models for estimating contextual informativeness, focus…
▽ More
Both humans and machines learn the meaning of unknown words through contextual information in a sentence, but not all contexts are equally helpful for learning. We introduce an effective method for capturing the level of contextual informativeness with respect to a given target word. Our study makes three main contributions. First, we develop models for estimating contextual informativeness, focusing on the instructional aspect of sentences. Our attention-based approach using pre-trained embeddings demonstrates state-of-the-art performance on our single-context dataset and an existing multi-sentence context dataset. Second, we show how our model identifies key contextual elements in a sentence that are likely to contribute most to a reader's understanding of the target word. Third, we examine how our contextual informativeness model, originally developed for vocabulary learning applications for students, can be used for develo** better training curricula for word embedding models in batch learning and few-shot machine learning settings. We believe our results open new possibilities for applications that support language learning for both human and machine learners.
△ Less
Submitted 9 November, 2023; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Demonstration of Superconducting Optoelectronic Single-Photon Synapses
Authors:
Saeed Khan,
Bryce A. Primavera,
Jeff Chiles,
Adam N. McCaughan,
Sonia M. Buckley,
Alexander N. Tait,
Adriana Lita,
John Biesecker,
Anna Fox,
David Olaya,
Richard P. Mirin,
Sae Woo Nam,
Jeffrey M. Shainline
Abstract:
Superconducting optoelectronic hardware is being explored as a path towards artificial spiking neural networks with unprecedented scales of complexity and computational ability. Such hardware combines integrated-photonic components for few-photon, light-speed communication with superconducting circuits for fast, energy-efficient computation. Monolithic integration of superconducting and photonic d…
▽ More
Superconducting optoelectronic hardware is being explored as a path towards artificial spiking neural networks with unprecedented scales of complexity and computational ability. Such hardware combines integrated-photonic components for few-photon, light-speed communication with superconducting circuits for fast, energy-efficient computation. Monolithic integration of superconducting and photonic devices is necessary for the scaling of this technology. In the present work, superconducting-nanowire single-photon detectors are monolithically integrated with Josephson junctions for the first time, enabling the realization of superconducting optoelectronic synapses. We present circuits that perform analog weighting and temporal leaky integration of single-photon presynaptic signals. Synaptic weighting is implemented in the electronic domain so that binary, single-photon communication can be maintained. Records of recent synaptic activity are locally stored as current in superconducting loops. Dendritic and neuronal nonlinearities are implemented with a second stage of Josephson circuitry. The hardware presents great design flexibility, with demonstrated synaptic time constants spanning four orders of magnitude (hundreds of nanoseconds to milliseconds). The synapses are responsive to presynaptic spike rates exceeding 10 MHz and consume approximately 33 aJ of dynamic power per synapse event before accounting for cooling. In addition to neuromorphic hardware, these circuits introduce new avenues towards realizing large-scale single-photon-detector arrays for diverse imaging, sensing, and quantum communication applications.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation
Authors:
Saad Naeem,
Omer Beg
Abstract:
Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a fram…
▽ More
Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a framework for supervised phoneme recognition that overcomes the data scarcity issue for low resource languages using a seq2seq neural architecture integrated with transfer learning, attention mechanism, and data augmentation. STRATA employs transfer learning to reduce the network loss in half. It uses attention mechanism for word boundaries and frame alignment detection which further reduces the network loss by 4% and is able to identify the word boundaries with 92.2% accuracy. STRATA uses various data augmentation techniques to further reduce the loss by 1.5% and is more robust towards new signals both in terms of generalization and accuracy. STRATA is able to achieve a Phoneme Error Rate of 16.5% and improves upon the state of the art by 1.1% for TIMIT dataset (English) and 11.5% for CSaLT dataset (Urdu).
△ Less
Submitted 16 April, 2022;
originally announced April 2022.
-
Learning JPEG Compression Artifacts for Image Manipulation Detection and Localization
Authors:
Myung-Joon Kwon,
Seung-Hun Nam,
In-Jae Yu,
Heung-Kyu Lee,
Changick Kim
Abstract:
Detecting and localizing image manipulation are necessary to counter malicious use of image editing techniques. Accordingly, it is essential to distinguish between authentic and tampered regions by analyzing intrinsic statistics in an image. We focus on JPEG compression artifacts left during image acquisition and editing. We propose a convolutional neural network (CNN) that uses discrete cosine tr…
▽ More
Detecting and localizing image manipulation are necessary to counter malicious use of image editing techniques. Accordingly, it is essential to distinguish between authentic and tampered regions by analyzing intrinsic statistics in an image. We focus on JPEG compression artifacts left during image acquisition and editing. We propose a convolutional neural network (CNN) that uses discrete cosine transform (DCT) coefficients, where compression artifacts remain, to localize image manipulation. Standard CNNs cannot learn the distribution of DCT coefficients because the convolution throws away the spatial coordinates, which are essential for DCT coefficients. We illustrate how to design and train a neural network that can learn the distribution of DCT coefficients. Furthermore, we introduce Compression Artifact Tracing Network (CAT-Net) that jointly uses image acquisition artifacts and compression artifacts. It significantly outperforms traditional and deep neural network-based methods in detecting and localizing tampered regions.
△ Less
Submitted 25 May, 2022; v1 submitted 29 August, 2021;
originally announced August 2021.
-
Neural Image Representations for Multi-Image Fusion and Layer Separation
Authors:
Seonghyeon Nam,
Marcus A. Brubaker,
Michael S. Brown
Abstract:
We propose a framework for aligning and fusing multiple images into a single view using neural image representations (NIRs), also known as implicit or coordinate-based neural representations. Our framework targets burst images that exhibit camera ego motion and potential changes in the scene. We describe different strategies for alignment depending on the nature of the scene motion -- namely, pers…
▽ More
We propose a framework for aligning and fusing multiple images into a single view using neural image representations (NIRs), also known as implicit or coordinate-based neural representations. Our framework targets burst images that exhibit camera ego motion and potential changes in the scene. We describe different strategies for alignment depending on the nature of the scene motion -- namely, perspective planar (i.e., homography), optical flow with minimal scene change, and optical flow with notable occlusion and disocclusion. With the neural image representation, our framework effectively combines multiple inputs into a single canonical view without the need for selecting one of the images as a reference frame. We demonstrate how to use this multi-frame fusion framework for various layer separation tasks. The code and results are available at https://shnnam.github.io/research/nir.
△ Less
Submitted 21 July, 2022; v1 submitted 2 August, 2021;
originally announced August 2021.
-
DHNet: Double MPEG-4 Compression Detection via Multiple DCT Histograms
Authors:
Seung-Hun Nam,
Wonhyuk Ahn,
Myung-Joon Kwon,
Jihyeon Kang,
In-Jae Yu
Abstract:
In this article, we aim to detect the double compression of MPEG-4, a universal video codec that is built into surveillance systems and shooting devices. Double compression is accompanied by various types of video manipulation, and its traces can be exploited to determine whether a video is a forgery. To this end, we present a neural network-based approach with discriminant features for capturing…
▽ More
In this article, we aim to detect the double compression of MPEG-4, a universal video codec that is built into surveillance systems and shooting devices. Double compression is accompanied by various types of video manipulation, and its traces can be exploited to determine whether a video is a forgery. To this end, we present a neural network-based approach with discriminant features for capturing peculiar artifacts in the discrete cosine transform (DCT) domain caused by double MPEG-4 compression. By analyzing the intra-coding process of MPEG-4, which performs block-DCT-based quantization, we exploit multiple DCT histograms as features to focus on the statistical properties of DCT coefficients on multiresolution blocks. Furthermore, we improve detection performance using a vectorized feature of the quantization table on dense layers as auxiliary information. Compared with neural network-based approaches suitable for exploring subtle manipulations, the experimental results reveal that this work achieves high performance.
△ Less
Submitted 15 April, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Topology and Routing Problems: The Circular Frame
Authors:
Rak-Kyeong Seong,
Chanho Min,
Sang-Hoon Han,
Jaeho Yang,
Seungwoo Nam,
Kyusam Oh
Abstract:
In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with st…
▽ More
In this work, we solve the problem of finding non-intersecting paths between points on a plane with a new approach by borrowing ideas from geometric topology, in particular, from the study of polygonal schema in mathematics. We use a topological transformation on the 2-dimensional planar routing environment that simplifies the routing problem into a problem of connecting points on a circle with straight line segments that do not intersect in the interior of the circle. These points are either the points that need to be connected by non-intersecting paths or special `reference' points that parametrize the topology of the original environment prior to the transformation. When all the necessary points on the circle are fully connected, the transformation is reversed such that the line segments combine to become the non-intersecting paths that connect the start and end points in the original environment. We interpret the transformed environment in which the routing problem is solved as a new data structure where any routing problem can be solved efficiently. We perform experiments and show that the routing time and success rate of the new routing algorithm outperforms the ones for the A*-algorithm.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
Temporally smooth online action detection using cycle-consistent future anticipation
Authors:
Young Hwi Kim,
Seonghyeon Nam,
Seon Joo Kim
Abstract:
Many video understanding tasks work in the offline setting by assuming that the input video is given from the start to the end. However, many real-world problems require the online setting, making a decision immediately using only the current and the past frames of videos such as in autonomous driving and surveillance systems. In this paper, we present a novel solution for online action detection…
▽ More
Many video understanding tasks work in the offline setting by assuming that the input video is given from the start to the end. However, many real-world problems require the online setting, making a decision immediately using only the current and the past frames of videos such as in autonomous driving and surveillance systems. In this paper, we present a novel solution for online action detection by using a simple yet effective RNN-based networks called the Future Anticipation and Temporally Smoothing network (FATSnet). The proposed network consists of a module for anticipating the future that can be trained in an unsupervised manner with the cycle-consistency loss, and another component for aggregating the past and the future for temporally smooth frame-by-frame predictions. We also propose a solution to relieve the performance loss when running RNN-based models on very long sequences. Evaluations on TVSeries, THUMOS14, and BBDB show that our method achieve the state-of-the-art performances compared to the previous works on online action detection.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Frame-rate Up-conversion Detection Based on Convolutional Neural Network for Learning Spatiotemporal Features
Authors:
Minseok Yoon,
Seung-Hun Nam,
In-Jae Yu,
Wonhyuk Ahn,
Myung-Joon Kwon,
Heung-Kyu Lee
Abstract:
With the advance in user-friendly and powerful video editing tools, anyone can easily manipulate videos without leaving prominent visual traces. Frame-rate up-conversion (FRUC), a representative temporal-domain operation, increases the motion continuity of videos with a lower frame-rate and is used by malicious counterfeiters in video tampering such as generating fake frame-rate video without impr…
▽ More
With the advance in user-friendly and powerful video editing tools, anyone can easily manipulate videos without leaving prominent visual traces. Frame-rate up-conversion (FRUC), a representative temporal-domain operation, increases the motion continuity of videos with a lower frame-rate and is used by malicious counterfeiters in video tampering such as generating fake frame-rate video without improving the quality or mixing temporally spliced videos. FRUC is based on frame interpolation schemes and subtle artifacts that remain in interpolated frames are often difficult to distinguish. Hence, detecting such forgery traces is a critical issue in video forensics. This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion. The proposed network uses a stack of consecutive frames as the input and effectively learns interpolation artifacts using network blocks to learn spatiotemporal features. This study is the first attempt to apply a neural network to the detection of FRUC. Moreover, it can cover the following three types of frame interpolation schemes: nearest neighbor interpolation, bilinear interpolation, and motion-compensated interpolation. In contrast to existing methods that exploit all frames to verify integrity, the proposed approach achieves a high detection speed because it observes only six frames to test its authenticity. Extensive experiments were conducted with conventional forensic methods and neural networks for video forensic tasks to validate our research. The proposed network achieved state-of-the-art performance in terms of detecting the interpolated artifacts of FRUC. The experimental results also demonstrate that our trained model is robust for an unseen dataset, unlearned frame-rate, and unlearned quality factor.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
PHIDL: Python CAD layout and geometry creation for nanolithography
Authors:
A. N. McCaughan,
A. M. Tait,
S. M. Buckley,
D. M. Oh,
J. T. Chiles,
J. M. Shainline,
S. W. Nam
Abstract:
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-sou…
▽ More
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-source CAD tools -- especially those in the GDSII design space -- that prevent wider adoption by scientists and students who might otherwise benefit from scripted design. For example, constructing relations between adjacent geometries is often much more difficult than necessary -- spacing a resonator structure a few micrometers from a readout structure often requires manually-coding the placement arithmetic. While inconveniences like this can be overcome by writing custom functions, they are often significant barriers to entry for new users or those less familiar with programming. To help streamline the design process and reduce barrier to entry for scripting designs, we have developed PHIDL, an open-source GDSII-based CAD tool for Python 2 and 3.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
WAN: Watermarking Attack Network
Authors:
Seung-Hun Nam,
In-Jae Yu,
Seung-Min Mun,
Daesik Kim,
Wonhyuk Ahn
Abstract:
Multi-bit watermarking (MW) has been developed to improve robustness against signal processing operations and geometric distortions. To this end, benchmark tools that test robustness by applying simulated attacks on watermarked images are available. However, limitations in these general attacks exist since they cannot exploit specific characteristics of the targeted MW. In addition, these attacks…
▽ More
Multi-bit watermarking (MW) has been developed to improve robustness against signal processing operations and geometric distortions. To this end, benchmark tools that test robustness by applying simulated attacks on watermarked images are available. However, limitations in these general attacks exist since they cannot exploit specific characteristics of the targeted MW. In addition, these attacks are usually devised without consideration of visual quality, which rarely occurs in the real world. To address these limitations, we propose a watermarking attack network (WAN), a fully trainable watermarking benchmark tool that utilizes the weak points of the target MW and induces an inversion of the watermark bit, thereby considerably reducing the watermark extractability. To hinder the extraction of hidden information while ensuring high visual quality, we utilize a residual dense blocks-based architecture specialized in local and global feature learning. A novel watermarking attack loss is introduced to break the MW systems. We empirically demonstrate that the WAN can successfully fool various block-based MW systems. Moreover, we show that existing MW methods can be improved with the help of the WAN as an add-on module.
△ Less
Submitted 20 October, 2021; v1 submitted 14 August, 2020;
originally announced August 2020.
-
Cross-Identity Motion Transfer for Arbitrary Objects through Pose-Attentive Video Reassembling
Authors:
Subin Jeon,
Seonghyeon Nam,
Seoung Wug Oh,
Seon Joo Kim
Abstract:
We propose an attention-based networks for transferring motions between arbitrary objects. Given a source image(s) and a driving video, our networks animate the subject in the source images according to the motion in the driving video. In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed in order to retrieve the appearance i…
▽ More
We propose an attention-based networks for transferring motions between arbitrary objects. Given a source image(s) and a driving video, our networks animate the subject in the source images according to the motion in the driving video. In our attention mechanism, dense similarities between the learned keypoints in the source and the driving images are computed in order to retrieve the appearance information from the source images. Taking a different approach from the well-studied war** based models, our attention-based model has several advantages. By reassembling non-locally searched pieces from the source contents, our approach can produce more realistic outputs. Furthermore, our system can make use of multiple observations of the source appearance (e.g. front and sides of faces) to make the results more accurate. To reduce the training-testing discrepancy of the self-supervised learning, a novel cross-identity training scheme is additionally introduced. With the training scheme, our networks is trained to transfer motions between different subjects, as in the real testing scenario. Experimental results validate that our method produces visually pleasing results in various object domains, showing better performances compared to previous works.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Deep Convolutional Neural Network for Identifying Seam-Carving Forgery
Authors:
Seung-Hun Nam,
Wonhyuk Ahn,
In-Jae Yu,
Myung-Joon Kwon,
Minseok Son,
Heung-Kyu Lee
Abstract:
Seam carving is a representative content-aware image retargeting approach to adjust the size of an image while preserving its visually prominent content. To maintain visually important content, seam-carving algorithms first calculate the connected path of pixels, referred to as the seam, according to a defined cost function and then adjust the size of an image by removing and duplicating repeatedl…
▽ More
Seam carving is a representative content-aware image retargeting approach to adjust the size of an image while preserving its visually prominent content. To maintain visually important content, seam-carving algorithms first calculate the connected path of pixels, referred to as the seam, according to a defined cost function and then adjust the size of an image by removing and duplicating repeatedly calculated seams. Seam carving is actively exploited to overcome diversity in the resolution of images between applications and devices; hence, detecting the distortion caused by seam carving has become important in image forensics. In this paper, we propose a convolutional neural network (CNN)-based approach to classifying seam-carving-based image retargeting for reduction and expansion. To attain the ability to learn low-level features, we designed a CNN architecture comprising five types of network blocks specialized for capturing subtle signals. An ensemble module is further adopted to both enhance performance and comprehensively analyze the features in the local areas of the given image. To validate the effectiveness of our work, extensive experiments based on various CNN-based baselines were conducted. Compared to the baselines, our work exhibits state-of-the-art performance in terms of three-class classification (original, seam inserted, and seam removed). In addition, our model with the ensemble module is robust for various unseen cases. The experimental results also demonstrate that our method can be applied to localize both seam-removed and seam-inserted areas.
△ Less
Submitted 7 July, 2020; v1 submitted 5 July, 2020;
originally announced July 2020.