Search | arXiv e-print repository

Prospective Role of Foundation Models in Advancing Autonomous Vehicles

Authors: Jianhua Wu, Bingzhao Gao, **cheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie Chen

Abstract: With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reas… ▽ More With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain. △ Less

Submitted 17 May, 2024; v1 submitted 8 December, 2023; originally announced May 2024.

Comments: 45 pages,8 figures

arXiv:2404.09995 [pdf, other]

Taming Latent Diffusion Model for Neural Radiance Field Inpainting

Authors: Chieh Hubert Lin, Changil Kim, Jia-Bin Huang, Qinbo Li, Chih-Yao Ma, Johannes Kopf, Ming-Hsuan Yang, Hung-Yu Tseng

Abstract: Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the rad… ▽ More Neural Radiance Field (NeRF) is a representation for 3D reconstruction from multi-view images. Despite some recent work showing preliminary success in editing a reconstructed NeRF with diffusion prior, they remain struggling to synthesize reasonable geometry in completely uncovered regions. One major reason is the high diversity of synthetic contents from the diffusion model, which hinders the radiance field from converging to a crisp and deterministic geometry. Moreover, applying latent diffusion models on real data often yields a textural shift incoherent to the image condition due to auto-encoding errors. These two problems are further reinforced with the use of pixel-distance losses. To address these issues, we propose tempering the diffusion model's stochasticity with per-scene customization and mitigating the textural shift with masked adversarial training. During the analyses, we also found the commonly used pixel and perceptual losses are harmful in the NeRF inpainting task. Through rigorous experiments, our framework yields state-of-the-art NeRF inpainting results on various real-world scenes. Project page: https://hubert0527.github.io/MALD-NeRF △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Project page: https://hubert0527.github.io/MALD-NeRF

arXiv:2403.16438 [pdf, other]

doi 10.1109/BIBM58861.2023.10385929

Real-time Neuron Segmentation for Voltage Imaging

Authors: Yosuke Bando, Ramdas Pillai, Atsushi Kajita, Farhan Abdul Hakeem, Yves Quemener, Hua-an Tseng, Kiryl D. Piatkevich, Changyang Linghu, Xue Han, Edward S. Boyden

Abstract: In voltage imaging, where the membrane potentials of individual neurons are recorded at from hundreds to thousand frames per second using fluorescence microscopy, data processing presents a challenge. Even a fraction of a minute of recording with a limited image size yields gigabytes of video data consisting of tens of thousands of frames, which can be time-consuming to process. Moreover, millisec… ▽ More In voltage imaging, where the membrane potentials of individual neurons are recorded at from hundreds to thousand frames per second using fluorescence microscopy, data processing presents a challenge. Even a fraction of a minute of recording with a limited image size yields gigabytes of video data consisting of tens of thousands of frames, which can be time-consuming to process. Moreover, millisecond-level short exposures lead to noisy video frames, obscuring neuron footprints especially in deep-brain samples where noisy signals are buried in background fluorescence. To address this challenge, we propose a fast neuron segmentation method able to detect multiple, potentially overlap**, spiking neurons from noisy video frames, and implement a data processing pipeline incorporating the proposed segmentation method along with GPU-accelerated motion correction. By testing on existing datasets as well as on new datasets we introduce, we show that our pipeline extracts neuron footprints that agree well with human annotation even from cluttered datasets, and demonstrate real-time processing of voltage imaging data on a single desktop computer for the first time. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Journal ref: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 813-818, 2023

arXiv:2403.15577 [pdf, other]

Autonomous Driving With Perception Uncertainties: Deep-Ensemble Based Adaptive Cruise Control

Authors: Xiao Li, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky

Abstract: Autonomous driving depends on perception systems to understand the environment and to inform downstream decision-making. While advanced perception systems utilizing black-box Deep Neural Networks (DNNs) demonstrate human-like comprehension, their unpredictable behavior and lack of interpretability may hinder their deployment in safety critical scenarios. In this paper, we develop an Ensemble of DN… ▽ More Autonomous driving depends on perception systems to understand the environment and to inform downstream decision-making. While advanced perception systems utilizing black-box Deep Neural Networks (DNNs) demonstrate human-like comprehension, their unpredictable behavior and lack of interpretability may hinder their deployment in safety critical scenarios. In this paper, we develop an Ensemble of DNN regressors (Deep Ensemble) that generates predictions with quantification of prediction uncertainties. In the scenario of Adaptive Cruise Control (ACC), we employ the Deep Ensemble to estimate distance headway to the lead vehicle from RGB images and enable the downstream controller to account for the estimation uncertainty. We develop an adaptive cruise controller that utilizes Stochastic Model Predictive Control (MPC) with chance constraints to provide a probabilistic safety guarantee. We evaluate our ACC algorithm using a high-fidelity traffic simulator and a real-world traffic dataset and demonstrate the ability of the proposed approach to effect speed tracking and car following while maintaining a safe distance headway. The out-of-distribution scenarios are also examined. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.09216 [pdf]

Unlocking the Potential of Open Government Data: Exploring the Strategic, Technical, and Application Perspectives of High-Value Datasets Opening in Taiwan

Authors: Hsien-Lee Tseng, Anastasija Nikiforova

Abstract: Today, data has an unprecedented value as it forms the basis for data-driven decision-making, including serving as an input for AI models, where the latter is highly dependent on the availability of the data. However, availability of data in an open data format creates a little added value, where the value of these data, i.e., their relevance to the real needs of the end user, is key. This is wher… ▽ More Today, data has an unprecedented value as it forms the basis for data-driven decision-making, including serving as an input for AI models, where the latter is highly dependent on the availability of the data. However, availability of data in an open data format creates a little added value, where the value of these data, i.e., their relevance to the real needs of the end user, is key. This is where the concept of high-value dataset (HVD) comes into play, which has become popular in recent years. Defining and opening HVD is an ongoing process consisting of a set of interrelated steps, the implementation of which may vary from one country or region to another. Therefore, there has recently been a call to conduct research in a country or region setting considered to be of greatest national value. So far, only a few studies have been conducted at the regional or national level, most of which consider only one step of the process, such as identifying HVD or measuring their impact. With this study, we answer this call and examine the national case of Taiwan by exploring the entire lifecycle of HVD opening. The aim of the paper is to understand and evaluate the lifecycle of high-value dataset publishing in one of the world's leading producers of information and communication technology (ICT) products - Taiwan. To do this, we conduct a qualitative study with exploratory interviews with representatives from government agencies in Taiwan responsible for HVD opening, exploring HVD opening lifecycle. As such, we examine (1) strategic aspects related to the HVD determination process, (2) technical aspects, and (3) application aspects. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for publication in Proceedings of the 25th Annual International Conference on Digital Government Research and this is a pre-print version of the manuscript. It is posted here for your personal use. Not for redistribution

arXiv:2403.01807 [pdf, other]

ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models

Authors: Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner

Abstract: 3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained… ▽ More 3D asset generation is getting massive amounts of attention, inspired by the recent success of text-guided 2D content creation. Existing text-to-3D methods use pretrained text-to-image diffusion models in an optimization problem or fine-tune them on synthetic data, which often results in non-photorealistic 3D objects without backgrounds. In this paper, we present a method that leverages pretrained text-to-image models as a prior, and learn to generate multi-view images in a single denoising process from real-world data. Concretely, we propose to integrate 3D volume-rendering and cross-frame-attention layers into each block of the existing U-Net network of the text-to-image model. Moreover, we design an autoregressive generation that renders more 3D-consistent images at any viewpoint. We train our model on real-world datasets of objects and showcase its capabilities to generate instances with a variety of high-quality shapes and textures in authentic surroundings. Compared to the existing methods, the results generated by our method are consistent, and have favorable visual quality (-30% FID, -37% KID). △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024, project page: https://lukashoel.github.io/ViewDiff/, video: https://www.youtube.com/watch?v=SdjoCqHzMMk, code: https://github.com/facebookresearch/ViewDiff

arXiv:2401.07464 [pdf, other]

Quantum Privacy Aggregation of Teacher Ensembles (QPATE) for Privacy-preserving Quantum Machine Learning

Authors: William Watkins, Heehwan Wang, Sangyoon Bae, Huan-Hsin Tseng, Jiook Cha, Samuel Yen-Chi Chen, Shinjae Yoo

Abstract: The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a… ▽ More The utility of machine learning has rapidly expanded in the last two decades and presents an ethical challenge. Papernot et. al. developed a technique, known as Private Aggregation of Teacher Ensembles (PATE) to enable federated learning in which multiple teacher models are trained on disjoint datasets. This study is the first to apply PATE to an ensemble of quantum neural networks (QNN) to pave a new way of ensuring privacy in quantum machine learning (QML) models. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2312.10880 [pdf, other]

Sharable Clothoid-based Continuous Motion Planning for Connected Automated Vehicles

Authors: Sanghoon Oh, Qi Chen, H. Eric Tseng, Gaurav Pandey, Gabor Orosz

Abstract: A continuous motion planning method for connected automated vehicles is considered for generating feasible trajectories in real-time using three consecutive clothoids. The proposed method reduces path planning to a small set of nonlinear algebraic equations such that the generated path can be efficiently checked for feasibility and collision. After path planning, velocity planning is executed whil… ▽ More A continuous motion planning method for connected automated vehicles is considered for generating feasible trajectories in real-time using three consecutive clothoids. The proposed method reduces path planning to a small set of nonlinear algebraic equations such that the generated path can be efficiently checked for feasibility and collision. After path planning, velocity planning is executed while maintaining a parallel simple structure. Key strengths of this framework include its interpretability, shareability, and ability to specify boundary conditions. Its interpretability and shareability stem from the succinct representation of the resulting local motion plan using a handful of physically meaningful parameters. Vehicles may share these parameters via V2X communication so that the recipients can precisely reconstruct the planned trajectory of the senders and respond accordingly. The proposed local planner guarantees the satisfaction of boundary conditions, thus ensuring seamless integration with a wide array of higher-level global motion planners. The tunable nature of the method enables tailoring the local plans to specific maneuvers like turns at intersections, lane changes, and U-turns. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: 14 pages, 14 figures

arXiv:2312.09429 [pdf]

Deep Learning-Enabled Swallowing Monitoring and Postoperative Recovery Biosensing System

Authors: Chih-Ning Tsai, Pei-Wen Yang, Tzu-Yen Huang, Jung-Chih Chen, Hsin-Yi Tseng, Che-Wei Wu, Amrit Sarmah, Tzu-En Lin

Abstract: This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material. This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material. △ Less

Submitted 24 November, 2023; originally announced December 2023.

Comments: the abstract can't uploaded fully

MSC Class: NA ACM Class: A.0

arXiv:2311.18832 [pdf, other]

Exploiting Diffusion Prior for Generalizable Dense Prediction

Authors: Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang

Abstract: Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap. We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks. To address the misalignment between deterministic prediction tasks and stochastic T2I models, we reform… ▽ More Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap. We introduce DMP, a pipeline utilizing pre-trained T2I models as a prior for dense prediction tasks. To address the misalignment between deterministic prediction tasks and stochastic T2I models, we reformulate the diffusion process through a sequence of interpolations, establishing a deterministic map** between input RGB images and output prediction distributions. To preserve generalizability, we use low-rank adaptation to fine-tune pre-trained models. Extensive experiments across five tasks, including 3D property estimation, semantic segmentation, and intrinsic image decomposition, showcase the efficacy of the proposed method. Despite limited-domain training data, the approach yields faithful estimations for arbitrary images, surpassing existing state-of-the-art algorithms. △ Less

Submitted 2 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: To appear in CVPR 2024. Project page: https://shinying.github.io/dmp

arXiv:2311.10041 [pdf, other]

Interpretable Reinforcement Learning for Robotics and Continuous Control

Authors: Rohan Paleja, Letian Chen, Yaru Niu, Andrew Silva, Zhaoxin Li, Songan Zhang, Chace Ritchie, Sugju Choi, Kimberlee Chestnut Chang, Hongtei Eric Tseng, Yan Wang, Subramanya Nageshrao, Matthew Gombolay

Abstract: Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. W… ▽ More Interpretability in machine learning is critical for the safe deployment of learned policies across legally-regulated and safety-critical domains. While gradient-based approaches in reinforcement learning have achieved tremendous success in learning policies for continuous control problems such as robotics and autonomous driving, the lack of interpretability is a fundamental barrier to adoption. We propose Interpretable Continuous Control Trees (ICCTs), a tree-based model that can be optimized via modern, gradient-based, reinforcement learning approaches to produce high-performing, interpretable policies. The key to our approach is a procedure for allowing direct optimization in a sparse decision-tree-like representation. We validate ICCTs against baselines across six domains, showing that ICCTs are capable of learning policies that parity or outperform baselines by up to 33% in autonomous driving scenarios while achieving a 300x-600x reduction in the number of parameters against deep learning baselines. We prove that ICCTs can serve as universal function approximators and display analytically that ICCTs can be verified in linear time. Furthermore, we deploy ICCTs in two realistic driving domains, based on interstate Highway-94 and 280 in the US. Finally, we verify ICCT's utility with end-users and find that ICCTs are rated easier to simulate, quicker to validate, and more interpretable than neural networks. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.02352

arXiv:2311.09221 [pdf, other]

doi 10.1145/3610548.3618153

Single-Image 3D Human Digitization with Shape-Guided Diffusion

Authors: Badour AlBahar, Shunsuke Saito, Hung-Yu Tseng, Changil Kim, Johannes Kopf, Jia-Bin Huang

Abstract: We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D… ▽ More We present an approach to generate a 360-degree view of a person with a consistent, high-resolution appearance from a single input image. NeRF and its variants typically require videos or images from different viewpoints. Most existing approaches taking monocular input either rely on ground-truth 3D scans for supervision or lack 3D consistency. While recent 3D generative models show promise of 3D consistent human digitization, these approaches do not generalize well to diverse clothing appearances, and the results lack photorealism. Unlike existing work, we utilize high-capacity 2D diffusion models pretrained for general image synthesis tasks as an appearance prior of clothed humans. To achieve better 3D consistency while retaining the input identity, we progressively synthesize multiple views of the human in the input image by inpainting missing regions with shape-guided diffusion conditioned on silhouette and surface normal. We then fuse these synthesized multi-view images via inverse rendering to obtain a fully textured high-resolution 3D mesh of the given person. Experiments show that our approach outperforms prior methods and achieves photorealistic 360-degree synthesis of a wide range of clothed humans with complex textures from a single image. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: SIGGRAPH Asia 2023. Project website: https://human-sgd.github.io/

arXiv:2311.06673 [pdf, other]

Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination

Authors: Lu Wen, Songan Zhang, H. Eric Tseng, Huei Peng

Abstract: Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm… ▽ More Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization. △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2310.20561 [pdf, other]

Predictive Control for Autonomous Driving with Uncertain, Multi-modal Predictions

Authors: Siddharth H. Nair, Hotae Lee, Eunhyek Joa, Yan Wang, H. Eric Tseng, Francesco Borrelli

Abstract: We propose a Stochastic MPC (SMPC) formulation for path planning with autonomous vehicles in scenarios involving multiple agents with multi-modal predictions. The multi-modal predictions capture the uncertainty of urban driving in distinct modes/maneuvers (e.g., yield, keep speed) and driving trajectories (e.g., speed, turning radius), which are incorporated for multi-modal collision avoidance cha… ▽ More We propose a Stochastic MPC (SMPC) formulation for path planning with autonomous vehicles in scenarios involving multiple agents with multi-modal predictions. The multi-modal predictions capture the uncertainty of urban driving in distinct modes/maneuvers (e.g., yield, keep speed) and driving trajectories (e.g., speed, turning radius), which are incorporated for multi-modal collision avoidance chance constraints for path planning. In the presence of multi-modal uncertainties, it is challenging to reliably compute feasible path planning solutions at real-time frequencies ($\geq$ 10 Hz). Our main technological contribution is a convex SMPC formulation that simultaneously (1) optimizes over parameterized feedback policies and (2) allocates risk levels for each mode of the prediction. The use of feedback policies and risk allocation enhances the feasibility and performance of the SMPC formulation against multi-modal predictions with large uncertainty. We evaluate our approach via simulations and road experiments with a full-scale vehicle interacting in closed-loop with virtual vehicles. We consider distinct, multi-modal driving scenarios: 1) Negotiating a traffic light and a fast, tailgating agent, 2) Executing an unprotected left turn at a traffic intersection, and 3) Changing lanes in the presence of multiple agents. For all of these scenarios, our approach reliably computes multi-modal solutions to the path-planning problem at real-time frequencies. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: The first three authors contributed equally

arXiv:2310.20148 [pdf, other]

Decision-Making for Autonomous Vehicles with Interaction-Aware Behavioral Prediction and Social-Attention Neural Network

Authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky

Abstract: Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social… ▽ More Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social-psychological parameters. Leveraging a Bayesian filter, we develop a receding-horizon optimization-based controller for autonomous vehicle decision-making which accounts for the uncertainties in the interacting drivers' intentions. For online deployment, we design a neural network architecture based on the attention mechanism which imitates the behavioral model with online estimated parameter priors. We also propose a decision tree search algorithm to solve the decision-making problem online. The proposed behavioral model is then evaluated in terms of its capabilities for real-world trajectory prediction. We further conduct extensive evaluations of the proposed decision-making module, in forced highway merging scenarios, using both simulated environments and real-world traffic datasets. The results demonstrate that our algorithms can complete the forced merging tasks in various traffic conditions while ensuring driving safety. △ Less

Submitted 31 October, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2310.20009 [pdf, other]

Nash or Stackelberg? -- A comparative study for game-theoretic AV decision-making

Authors: Brady Bateman, Ming Xin, H. Eric Tseng, Mushuang Liu

Abstract: This paper studies game-theoretic decision-making for autonomous vehicles (AVs). A receding horizon multi-player game is formulated to model the AV decision-making problem. Two classes of games, including Nash game and Stackelber games, are developed respectively. For each of the two games, two solution settings, including pairwise games and multi-player games, are introduced, respectively, to sol… ▽ More This paper studies game-theoretic decision-making for autonomous vehicles (AVs). A receding horizon multi-player game is formulated to model the AV decision-making problem. Two classes of games, including Nash game and Stackelber games, are developed respectively. For each of the two games, two solution settings, including pairwise games and multi-player games, are introduced, respectively, to solve the game in multi-agent scenarios. Comparative studies are conducted via statistical simulations to gain understandings of the performance of the two classes of games and of the two solution settings, respectively. The simulations are conducted in intersection-crossing scenarios, and the game performance is quantified by three metrics: safety, travel efficiency, and computational time. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 8 pages, submitted to ECC24

arXiv:2310.15084 [pdf, other]

Quantum Federated Learning With Quantum Networks

Authors: Tyler Wang, Huan-Hsin Tseng, Shinjae Yoo

Abstract: A major concern of deep learning models is the large amount of data that is required to build and train them, much of which is reliant on sensitive and personally identifiable information that is vulnerable to access by third parties. Ideas of using the quantum internet to address this issue have been previously proposed, which would enable fast and completely secure online communications. Previou… ▽ More A major concern of deep learning models is the large amount of data that is required to build and train them, much of which is reliant on sensitive and personally identifiable information that is vulnerable to access by third parties. Ideas of using the quantum internet to address this issue have been previously proposed, which would enable fast and completely secure online communications. Previous work has yielded a hybrid quantum-classical transfer learning scheme for classical data and communication with a hub-spoke topology. While quantum communication is secure from eavesdrop attacks and no measurements from quantum to classical translation, due to no cloning theorem, hub-spoke topology is not ideal for quantum communication without quantum memory. Here we seek to improve this model by implementing a decentralized ring topology for the federated learning scheme, where each client is given a portion of the entire dataset and only performs training on that set. We also demonstrate the first successful use of quantum weights for quantum federated learning, which allows us to perform our training entirely in quantum. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.06973 [pdf, other]

Federated Quantum Machine Learning with Differential Privacy

Authors: Rod Rofougaran, Shinjae Yoo, Huan-Hsin Tseng, Samuel Yen-Chi Chen

Abstract: The preservation of privacy is a critical concern in the implementation of artificial intelligence on sensitive training data. There are several techniques to preserve data privacy but quantum computations are inherently more secure due to the no-cloning theorem, resulting in a most desirable computational platform on top of the potential quantum advantages. There have been prior works in protecti… ▽ More The preservation of privacy is a critical concern in the implementation of artificial intelligence on sensitive training data. There are several techniques to preserve data privacy but quantum computations are inherently more secure due to the no-cloning theorem, resulting in a most desirable computational platform on top of the potential quantum advantages. There have been prior works in protecting data privacy by Quantum Federated Learning (QFL) and Quantum Differential Privacy (QDP) studied independently. However, to the best of our knowledge, no prior work has addressed both QFL and QDP together yet. Here, we propose to combine these privacy-preserving methods and implement them on the quantum platform, so that we can achieve comprehensive protection against data leakage (QFL) and model inversion attacks (QDP). This implementation promises more efficient and secure artificial intelligence. In this paper, we present a successful implementation of these privacy-preservation methods by performing the binary classification of the Cats vs Dogs dataset. Using our quantum-classical machine learning model, we obtained a test accuracy of over 0.98, while maintaining epsilon values less than 1.3. We show that federated differentially private training is a viable privacy preservation method for quantum machine learning on Noisy Intermediate-Scale Quantum (NISQ) devices. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 5 pages, 7 figures

arXiv:2309.14497 [pdf, other]

Interaction-Aware Decision-Making for Autonomous Vehicles in Forced Merging Scenario Leveraging Social Psychology Factors

Authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky

Abstract: Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon… ▽ More Understanding the intention of vehicles in the surrounding traffic is crucial for an autonomous vehicle to successfully accomplish its driving tasks in complex traffic scenarios such as highway forced merging. In this paper, we consider a behavioral model that incorporates both social behaviors and personal objectives of the interacting drivers. Leveraging this model, we develop a receding-horizon control-based decision-making strategy, that estimates online the other drivers' intentions using Bayesian filtering and incorporates predictions of nearby vehicles' behaviors under uncertain intentions. The effectiveness of the proposed decision-making strategy is demonstrated and evaluated based on simulation studies in comparison with a game theoretic controller and a real-world traffic dataset. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.04063 [pdf, other]

INSURE: An Information Theory Inspired Disentanglement and Purification Model for Domain Generalization

Authors: Xi Yu, Huan-Hsin Tseng, Shinjae Yoo, Haibin Ling, Yuewei Lin

Abstract: Domain Generalization (DG) aims to learn a generalizable model on the unseen target domain by only training on the multiple observed source domains. Although a variety of DG methods have focused on extracting domain-invariant features, the domain-specific class-relevant features have attracted attention and been argued to benefit generalization to the unseen target domain. To take into account the… ▽ More Domain Generalization (DG) aims to learn a generalizable model on the unseen target domain by only training on the multiple observed source domains. Although a variety of DG methods have focused on extracting domain-invariant features, the domain-specific class-relevant features have attracted attention and been argued to benefit generalization to the unseen target domain. To take into account the class-relevant domain-specific information, in this paper we propose an Information theory iNspired diSentanglement and pURification modEl (INSURE) to explicitly disentangle the latent features to obtain sufficient and compact (necessary) class-relevant feature for generalization to the unseen domain. Specifically, we first propose an information theory inspired loss function to ensure the disentangled class-relevant features contain sufficient class label information and the other disentangled auxiliary feature has sufficient domain information. We further propose a paired purification loss function to let the auxiliary feature discard all the class-relevant information and thus the class-relevant feature will contain sufficient and compact (necessary) class-relevant information. Moreover, instead of using multiple encoders, we propose to use a learnable binary mask as our disentangler to make the disentanglement more efficient and make the disentangled features complementary to each other. We conduct extensive experiments on four widely used DG benchmark datasets including PACS, OfficeHome, TerraIncognita, and DomainNet. The proposed INSURE outperforms the state-of-art methods. We also empirically show that domain-specific class-relevant features are beneficial for domain generalization. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 10 pages, 4 figures

arXiv:2304.14404 [pdf, other]

Motion-Conditioned Diffusion Model for Controllable Video Synthesis

Authors: Tsai-Shien Chen, Chieh Hubert Lin, Hung-Yu Tseng, Tsung-Yi Lin, Ming-Hsuan Yang

Abstract: Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective me… ▽ More Recent advancements in diffusion models have greatly improved the quality and diversity of synthesized content. To harness the expressive power of diffusion models, researchers have explored various controllable mechanisms that allow users to intuitively guide the content synthesis process. Although the latest efforts have primarily focused on video synthesis, there has been a lack of effective methods for controlling and describing desired content and motion. In response to this gap, we introduce MCDiff, a conditional diffusion model that generates a video from a starting image frame and a set of strokes, which allow users to specify the intended content and dynamics for synthesis. To tackle the ambiguity of sparse motion inputs and achieve better synthesis quality, MCDiff first utilizes a flow completion model to predict the dense video motion based on the semantic understanding of the video frame and the sparse motion control. Then, the diffusion model synthesizes high-quality future frames to form the output video. We qualitatively and quantitatively show that MCDiff achieves the state-the-of-art visual quality in stroke-guided controllable video synthesis. Additional experiments on MPII Human Pose further exhibit the capability of our model on diverse content and motion synthesis. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Project page: https://tsaishien-chen.github.io/MCDiff/

arXiv:2303.17598 [pdf, other]

Consistent View Synthesis with Pose-Guided Diffusion Models

Authors: Hung-Yu Tseng, Qinbo Li, Changil Kim, Suhib Alsisan, Jia-Bin Huang, Johannes Kopf

Abstract: Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion mode… ▽ More Novel view synthesis from a single image has been a cornerstone problem for many Virtual Reality applications that provide immersive experiences. However, most existing techniques can only synthesize novel views within a limited range of camera motion or fail to generate consistent and high-quality novel views under significant camera movement. In this work, we propose a pose-guided diffusion model to generate a consistent long-term video of novel views from a single image. We design an attention layer that uses epipolar lines as constraints to facilitate the association between different viewpoints. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed diffusion model against state-of-the-art transformer-based and GAN-based approaches. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: CVPR 2023. Project page: https://poseguided-diffusion.github.io/

arXiv:2303.16280 [pdf, other]

UVCGAN v2: An Improved Cycle-Consistent GAN for Unpaired Image-to-Image Translation

Authors: Dmitrii Torbunov, Yi Huang, Huan-Hsin Tseng, Haiwang Yu, ** Huang, Shinjae Yoo, Meifeng Lin, Brett Viren, Yihui Ren

Abstract: An unpaired image-to-image (I2I) translation technique seeks to find a map** between two domains of data in a fully unsupervised manner. While initial solutions to the I2I problem were provided by generative adversarial neural networks (GANs), diffusion models (DMs) currently hold the state-of-the-art status on the I2I translation benchmarks in terms of Frechet inception distance (FID). Yet, DMs… ▽ More An unpaired image-to-image (I2I) translation technique seeks to find a map** between two domains of data in a fully unsupervised manner. While initial solutions to the I2I problem were provided by generative adversarial neural networks (GANs), diffusion models (DMs) currently hold the state-of-the-art status on the I2I translation benchmarks in terms of Frechet inception distance (FID). Yet, DMs suffer from limitations, such as not using data from the source domain during the training or maintaining consistency of the source and translated images only via simple pixel-wise errors. This work improves a recent UVCGAN model and equips it with modern advancements in model architectures and training procedures. The resulting revised model significantly outperforms other advanced GAN- and DM-based competitors on a variety of benchmarks. In the case of Male-to-Female translation of CelebA, the model achieves more than 40% improvement in FID score compared to the state-of-the-art results. This work also demonstrates the ineffectiveness of the pixel-wise I2I translation faithfulness metrics and suggests their revision. The code and trained models are available at https://github.com/LS4GAN/uvcgan2 △ Less

Submitted 22 September, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.12861 [pdf, other]

Parallel Diffusion Model-based Sparse-view Cone-beam Breast CT

Authors: Wenjun Xia, Hsin Wu Tseng, Chuang Niu, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Srinivasan Vedantham, Ge Wang

Abstract: Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation d… ▽ More Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation dose without compromising image quality, according to the ALARA principle (as low as reasonably achievable). Over the past years, deep learning has shown remarkable successes in various tasks, including low-dose CT especially few-view CT. Currently, the diffusion model presents the state of the art for CT reconstruction. To develop the first diffusion model-based breast CT reconstruction method, here we report innovations to address the large memory requirement for breast cone-beam CT reconstruction and high computational cost of the diffusion model. Specifically, in this study we transform the cutting-edge Denoising Diffusion Probabilistic Model (DDPM) into a parallel framework for sub-volume-based sparse-view breast CT image reconstruction in projection and image domains. This novel approach involves the concurrent training of two distinct DDPM models dedicated to processing projection and image data synergistically in the dual domains. Our experimental findings reveal that this method delivers competitive reconstruction performance at half to one-third of the standard radiation doses. This advancement demonstrates an exciting potential of diffusion-type models for volumetric breast reconstruction at high-resolution with much-reduced radiation dose and as such hopefully redefines breast cancer screening and diagnosis. △ Less

Submitted 28 January, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.09905 [pdf, other]

More Robust Schema-Guided Dialogue State Tracking via Tree-Based Paraphrase Ranking

Authors: A. Coca, B. H. Tseng, W. Lin, B. Byrne

Abstract: The schema-guided paradigm overcomes scalability issues inherent in building task-oriented dialogue (TOD) agents with static ontologies. Instead of operating on dialogue context alone, agents have access to hierarchical schemas containing task-relevant natural language descriptions. Fine-tuned language models excel at schema-guided dialogue state tracking (DST) but are sensitive to the writing sty… ▽ More The schema-guided paradigm overcomes scalability issues inherent in building task-oriented dialogue (TOD) agents with static ontologies. Instead of operating on dialogue context alone, agents have access to hierarchical schemas containing task-relevant natural language descriptions. Fine-tuned language models excel at schema-guided dialogue state tracking (DST) but are sensitive to the writing style of the schemas. We explore methods for improving the robustness of DST models. We propose a framework for generating synthetic schemas which uses tree-based ranking to jointly optimise lexical diversity and semantic faithfulness. The generalisation of strong baselines is improved when augmenting their training data with prompts generated by our framework, as demonstrated by marked improvements in average joint goal accuracy (JGA) and schema sensitivity (SS) on the SGD-X benchmark. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: Accepted at EACL (Findings) 2023

arXiv:2302.01798 [pdf, other]

Interpretations of Domain Adaptations via Layer Variational Analysis

Authors: Huan-Hsin Tseng, Hsin-Yi Lin, Kuo-Hsuan Hung, Yu Tsao

Abstract: Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with c… ▽ More Transfer learning is known to perform efficiently in many applications empirically, yet limited literature reports the mechanism behind the scene. This study establishes both formal derivations and heuristic analysis to formulate the theory of transfer learning in deep learning. Our framework utilizing layer variational analysis proves that the success of transfer learning can be guaranteed with corresponding data conditions. Moreover, our theoretical calculation yields intuitive interpretations towards the knowledge transfer process. Subsequently, an alternative method for network-based transfer learning is derived. The method shows an increase in efficiency and accuracy for domain adaptation. It is particularly advantageous when new domain data is sufficiently sparse during adaptation. Numerical experiments over diverse tasks validated our theory and verified that our analytic expression achieved better performance in domain adaptation than the gradient descent method. △ Less

Submitted 9 May, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

Comments: Published at ICLR 2023

arXiv:2301.02239 [pdf, other]

Robust Dynamic Radiance Fields

Authors: Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, Jia-Bin Huang

Abstract: Dynamic radiance field reconstruction methods aim to model the time-varying structure and appearance of a dynamic scene. Existing methods, however, assume that accurate camera poses can be reliably estimated by Structure from Motion (SfM) algorithms. These methods, thus, are unreliable as SfM algorithms often fail or produce erroneous poses on challenging videos with highly dynamic objects, poorly… ▽ More Dynamic radiance field reconstruction methods aim to model the time-varying structure and appearance of a dynamic scene. Existing methods, however, assume that accurate camera poses can be reliably estimated by Structure from Motion (SfM) algorithms. These methods, thus, are unreliable as SfM algorithms often fail or produce erroneous poses on challenging videos with highly dynamic objects, poorly textured surfaces, and rotating camera motion. We address this robustness issue by jointly estimating the static and dynamic radiance fields along with the camera parameters (poses and focal length). We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods. △ Less

Submitted 21 March, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: CVPR 2023. Project page: https://robust-dynrf.github.io/

arXiv:2211.12628 [pdf, other]

Safe Control and Learning Using Generalized Action Governor

Authors: Nan Li, Yutong Li, Ilya Kolmanovsky, Anouck Girard, H. Eric Tseng, Dimitar Filev

Abstract: This paper introduces the Generalized Action Governor, which is a supervisory scheme for augmenting a nominal closed-loop system with the capability of strictly handling constraints. After presenting its theory for general systems and introducing tailored design approaches for linear and discrete systems, we discuss its application to safe online learning, which aims to safely evolve control param… ▽ More This paper introduces the Generalized Action Governor, which is a supervisory scheme for augmenting a nominal closed-loop system with the capability of strictly handling constraints. After presenting its theory for general systems and introducing tailored design approaches for linear and discrete systems, we discuss its application to safe online learning, which aims to safely evolve control parameters using real-time data to improve performance for uncertain systems. In particular, we propose two safe learning algorithms based on integration of reinforcement learning/data-driven Koopman operator-based control with the generalized action governor. The developments are illustrated with a numerical example. △ Less

Submitted 22 November, 2022; originally announced November 2022.

Comments: 10 pages, 4 figures

arXiv:2211.11997 [pdf, other]

REFINE: Reachability-based Trajectory Design using Robust Feedback Linearization and Zonotopes

Authors: **sun Liu, Yifei Shao, Lucas Lymburner, Hansen Qin, Vishrut Kaushik, Lena Trang, Ruiyang Wang, Vladimir Ivanovic, H. Eric Tseng, Ram Vasudevan

Abstract: Performing real-time receding horizon motion planning for autonomous vehicles while providing safety guarantees remains difficult. This is because existing methods to accurately predict ego vehicle behavior under a chosen controller use online numerical integration that requires a fine time discretization and thereby adversely affects real-time performance. To address this limitation, several rece… ▽ More Performing real-time receding horizon motion planning for autonomous vehicles while providing safety guarantees remains difficult. This is because existing methods to accurately predict ego vehicle behavior under a chosen controller use online numerical integration that requires a fine time discretization and thereby adversely affects real-time performance. To address this limitation, several recent papers have proposed to apply offline reachability analysis to conservatively predict the behavior of the ego vehicle. This reachable set can be constructed by utilizing a simplified model whose behavior is assumed a priori to conservatively bound the dynamics of a full-order model. However, guaranteeing that one satisfies this assumption is challenging. This paper proposes a framework named REFINE to overcome the limitations of these existing approaches. REFINE utilizes a parameterized robust controller that partially linearizes the vehicle dynamics even in the presence of modeling error. Zonotope-based reachability analysis is then performed on the closed-loop, full-order vehicle dynamics to compute the corresponding control-parameterized, over-approximate Forward Reachable Sets (FRS). Because reachability analysis is applied to the full-order model, the potential conservativeness introduced by using a simplified model is avoided. The pre-computed, control-parameterized FRS is then used online in an optimization framework to ensure safety. The proposed method is compared to several state of the art methods during a simulation-based evaluation on a full-size vehicle model and is evaluated on a 1/10th race car robot in real hardware testing. In contrast to existing methods, REFINE is shown to enable the vehicle to safely navigate itself through complex environments. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.06508 [pdf, other]

On the robustness of non-intrusive speech quality model by adversarial examples

Authors: Hsin-Yi Lin, Huan-Hsin Tseng, Yu Tsao

Abstract: It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives. Although network models have potential to be a surrogate for complex human hearing perception, they may contain instabilities in predictions. This work shows that deep speech quality predictors can be vulnerable to adversarial pertu… ▽ More It has been shown recently that deep learning based models are effective on speech quality prediction and could outperform traditional metrics in various perspectives. Although network models have potential to be a surrogate for complex human hearing perception, they may contain instabilities in predictions. This work shows that deep speech quality predictors can be vulnerable to adversarial perturbations, where the prediction can be changed drastically by unnoticeable perturbations as small as $-30$ dB compared with speech inputs. In addition to exposing the vulnerability of deep speech quality predictors, we further explore and confirm the viability of adversarial training for strengthening robustness of models. △ Less

Submitted 11 November, 2022; originally announced November 2022.

arXiv:2210.12476 [pdf, other]

doi 10.1109/JISPIN.2023.3340987

A Flexible-Frame-Rate Vision-Aided Inertial Object Tracking System for Mobile Devices

Authors: Yo-Chung Lau, Kuan-Wei Tseng, I-Ju Hsieh, Hsiao-Ching Tseng, Yi-** Hung

Abstract: Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place.… ▽ More Real-time object pose estimation and tracking is challenging but essential for emerging augmented reality (AR) applications. In general, state-of-the-art methods address this problem using deep neural networks which indeed yield satisfactory results. Nevertheless, the high computational cost of these methods makes them unsuitable for mobile devices where real-world applications usually take place. In addition, head-mounted displays such as AR glasses require at least 90~FPS to avoid motion sickness, which further complicates the problem. We propose a flexible-frame-rate object pose estimation and tracking system for mobile devices. It is a monocular visual-inertial-based system with a client-server architecture. Inertial measurement unit (IMU) pose propagation is performed on the client side for high speed tracking, and RGB image-based 3D pose estimation is performed on the server side to obtain accurate poses, after which the pose is sent to the client side for visual-inertial fusion, where we propose a bias self-correction mechanism to reduce drift. We also propose a pose inspection algorithm to detect tracking failures and incorrect pose estimation. Connected by high-speed networking, our system supports flexible frame rates up to 120 FPS and guarantees high precision and real-time tracking on low-end devices. Both simulations and real world experiments show that our method achieves accurate and robust object tracking. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2210.04400 [pdf]

Focus Plus: Detect Learner's Distraction by Web Camera in Distance Teaching

Authors: Eason Chen, Yuen Hsien Tseng, Kuo-** Lo

Abstract: Distance teaching has become popular these years because of the COVID-19 epidemic. However, both students and teachers face several challenges in distance teaching, like being easy to distract. We proposed Focus+, a system designed to detect learners' status with the latest AI technology from their web camera to solve such challenges. By doing so, teachers can know students' status, and students c… ▽ More Distance teaching has become popular these years because of the COVID-19 epidemic. However, both students and teachers face several challenges in distance teaching, like being easy to distract. We proposed Focus+, a system designed to detect learners' status with the latest AI technology from their web camera to solve such challenges. By doing so, teachers can know students' status, and students can regulate their learning experience. In this research, we will discuss the expected model's design for training and evaluating the AI detection model of Focus+. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: 5 Pages, 4 Figures, 2021 National Chair Professorship Academic Series: Teaching and Learning in Pandemic Era

arXiv:2208.12675 [pdf, other]

Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model

Authors: Shin-I Cheng, Yu-Jie Chen, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

Abstract: Generating images from hand-drawings is a crucial and fundamental task in content creation. The translation is difficult as there exist infinite possibilities and the different users usually expect different outcomes. Therefore, we propose a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models. Users can not only deci… ▽ More Generating images from hand-drawings is a crucial and fundamental task in content creation. The translation is difficult as there exist infinite possibilities and the different users usually expect different outcomes. Therefore, we propose a unified framework supporting a three-dimensional control over the image synthesis from sketches and strokes based on diffusion models. Users can not only decide the level of faithfulness to the input strokes and sketches, but also the degree of realism, as the user inputs are usually not consistent with the real images. Qualitative and quantitative experiments demonstrate that our framework achieves state-of-the-art performance while providing flexibility in generating customized images with control over shape, color, and realism. Moreover, our method unleashes applications such as editing on real images, generation with partial sketches and strokes, and multi-domain multi-modal synthesis. △ Less

Submitted 1 September, 2022; v1 submitted 26 August, 2022; originally announced August 2022.

arXiv:2208.07256 [pdf, other]

Multi-modal Transformer Path Prediction for Autonomous Vehicle

Authors: Chia Hong Tseng, Jie Zhang, Min-Te Sun, Kazuya Sakai, Wei-Shinn Ku

Abstract: Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a… ▽ More Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a path prediction system named Multi-modal Transformer Path Prediction (MTPP) that aims to predict long-term future trajectory of target agents. To achieve more accurate path prediction, the Transformer architecture is adopted in our model. To better utilize the lane information, the lanes which are in opposite direction to target agent are not likely to be taken by the target agent and are consequently filtered out. In addition, consecutive lane chunks are combined to ensure the lane input to be long enough for path prediction. An extensive evaluation is conducted to show the efficacy of the proposed system using nuScene, a real-world trajectory forecasting dataset. △ Less

Submitted 15 August, 2022; originally announced August 2022.

Comments: 9 pages, 12 figures, and 5 tables

arXiv:2208.03529 [pdf, other]

Collision Avoidance for Dynamic Obstacles with Uncertain Predictions using Model Predictive Control

Authors: Siddharth H. Nair, Eric H. Tseng, Francesco Borrelli

Abstract: We propose a Model Predictive Control (MPC) for collision avoidance between an autonomous agent and dynamic obstacles with uncertain predictions. The collision avoidance constraints are imposed by enforcing positive distance between convex sets representing the agent and the obstacles, and tractably reformulating them using Lagrange duality. This approach allows for smooth collision avoidance cons… ▽ More We propose a Model Predictive Control (MPC) for collision avoidance between an autonomous agent and dynamic obstacles with uncertain predictions. The collision avoidance constraints are imposed by enforcing positive distance between convex sets representing the agent and the obstacles, and tractably reformulating them using Lagrange duality. This approach allows for smooth collision avoidance constraints even for polytopes, which otherwise require mixed-integer or non-smooth constraints. We consider three widely used descriptions of the uncertain obstacle position: 1) Arbitrary distribution with polytopic support, 2) Gaussian distributions and 3) Arbitrary distribution with first two moments known. For each case we obtain deterministic reformulations of the collision avoidance constraints. The proposed MPC formulation optimizes over feedback policies to reduce conservatism in satisfying the collision avoidance constraints. The proposed approach is validated using simulations of traffic intersections in CARLA. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted to CDC'22

arXiv:2207.13286 [pdf, other]

Vector Quantized Image-to-Image Translation

Authors: Yu-Jie Chen, Shin-I Cheng, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

Abstract: Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the conditional contexts. In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. The vector quantized conte… ▽ More Current image-to-image translation methods formulate the task with conditional generation models, leading to learning only the recolorization or regional changes as being constrained by the rich structural information provided by the conditional contexts. In this work, we propose introducing the vector quantization technique into the image-to-image translation framework. The vector quantized content representation can facilitate not only the translation, but also the unconditional distribution shared among different domains. Meanwhile, along with the disentangled style representation, the proposed method further enables the capability of image extension with flexibility in both intra- and inter-domains. Qualitative and quantitative experiments demonstrate that our framework achieves comparable performance to the state-of-the-art image-to-image translation and image extension methods. Compared to methods for individual tasks, the proposed method, as a unified framework, unleashes applications combining image-to-image translation, unconditional generation, and image extension altogether. For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities. △ Less

Submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.08240 [pdf, other]

Robust Action Governor for Uncertain Piecewise Affine Systems with Non-convex Constraints and Safe Reinforcement Learning

Authors: Yutong Li, Nan Li, H. Eric Tseng, Anouck Girard, Dimitar Filev, Ilya Kolmanovsky

Abstract: The action governor is an add-on scheme to a nominal control loop that monitors and adjusts the control actions to enforce safety specifications expressed as pointwise-in-time state and control constraints. In this paper, we introduce the Robust Action Governor (RAG) for systems the dynamics of which can be represented using discrete-time Piecewise Affine (PWA) models with both parametric and addi… ▽ More The action governor is an add-on scheme to a nominal control loop that monitors and adjusts the control actions to enforce safety specifications expressed as pointwise-in-time state and control constraints. In this paper, we introduce the Robust Action Governor (RAG) for systems the dynamics of which can be represented using discrete-time Piecewise Affine (PWA) models with both parametric and additive uncertainties and subject to non-convex constraints. We develop the theoretical properties and computational approaches for the RAG. After that, we introduce the use of the RAG for realizing safe Reinforcement Learning (RL), i.e., ensuring all-time constraint satisfaction during online RL exploration-and-exploitation process. This development enables safe real-time evolution of the control policy and adaptation to changes in the operating environment and system parameters (due to aging, damage, etc.). We illustrate the effectiveness of the RAG in constraint enforcement and safe RL using the RAG by considering their applications to a soft-landing problem of a mass-spring-damper system. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2206.01202 [pdf, other]

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features

Authors: Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

Abstract: Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks. However, existing metrics for quantifying the strength of positional information remain unreliable and frequently lead to erroneous results. To address this issue, we propose novel metrics for measuring (and visualizing) the en… ▽ More Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks. However, existing metrics for quantifying the strength of positional information remain unreliable and frequently lead to erroneous results. To address this issue, we propose novel metrics for measuring (and visualizing) the encoded positional information. We formally define the encoded information as PPP (Position-information Pattern from Padding) and conduct a series of experiments to study its properties as well as its formation. The proposed metrics measure the presence of positional information more reliably than the existing metrics based on PosENet and a test in F-Conv. We also demonstrate that for any extant (and proposed) padding schemes, PPP is primarily a learning artifact and is less dependent on the characteristics of the underlying padding schemes. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.01252 [pdf, other]

SIMD$^2$: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM

Authors: Yunan Zhang, Po-An Tsai, Hung-Wei Tseng

Abstract: Matrix-multiplication units (MXUs) are now prevalent in every computing platform. The key attribute that makes MXUs so successful is the semiring structure, which allows tiling for both parallelism and data reuse. Nonetheless, matrix-multiplication is not the only algorithm with such attributes. We find that many algorithms share the same structure and differ in only the core operation; for exampl… ▽ More Matrix-multiplication units (MXUs) are now prevalent in every computing platform. The key attribute that makes MXUs so successful is the semiring structure, which allows tiling for both parallelism and data reuse. Nonetheless, matrix-multiplication is not the only algorithm with such attributes. We find that many algorithms share the same structure and differ in only the core operation; for example, using add-minimum instead of multiply-add. Algorithms with a semiring-like structure therefore have potential to be accelerated by a general-purpose matrix operation architecture, instead of common MXUs. In this paper, we propose SIMD$^2$, a new programming paradigm to support generalized matrix operations with a semiring-like structure. SIMD$^2$ instructions accelerate eight more types of matrix operations, in addition to matrix multiplications. Since SIMD$^2$ instructions resemble a matrix-multiplication instruction, we are able to build SIMD$^2$ architecture on top of any MXU architecture with minimal modifications. We developed a framework that emulates and validates SIMD$^2$ using NVIDIA GPUs with Tensor Cores. Across 8 applications, SIMD2 provides up to 38.59$\times$ speedup and more than 10.63$\times$ on average over optimized CUDA programs, with only 5% of full-chip area overhead. △ Less

Submitted 31 August, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

Comments: To Appear in the 49th International Symposium on Computer Architecture (ISCA'22), June 18--22, 2022, New York, NY, USA

arXiv:2112.07624 [pdf, other]

Interaction-Aware Trajectory Prediction and Planning for Autonomous Vehicles in Forced Merge Scenarios

Authors: Kaiwen Liu, Nan Li, H. Eric Tseng, Ilya Kolmanovsky, Anouck Girard

Abstract: Merging is, in general, a challenging task for both human drivers and autonomous vehicles, especially in dense traffic, because the merging vehicle typically needs to interact with other vehicles to identify or create a gap and safely merge into. In this paper, we consider the problem of autonomous vehicle control for forced merge scenarios. We propose a novel game-theoretic controller, called the… ▽ More Merging is, in general, a challenging task for both human drivers and autonomous vehicles, especially in dense traffic, because the merging vehicle typically needs to interact with other vehicles to identify or create a gap and safely merge into. In this paper, we consider the problem of autonomous vehicle control for forced merge scenarios. We propose a novel game-theoretic controller, called the Leader-Follower Game Controller (LFGC), in which the interactions between the autonomous ego vehicle and other vehicles with a priori uncertain driving intentions is modeled as a partially observable leader-follower game. The LFGC estimates the other vehicles' intentions online based on observed trajectories, and then predicts their future trajectories and plans the ego vehicle's own trajectory using Model Predictive Control (MPC) to simultaneously achieve probabilistically guaranteed safety and merging objectives. To verify the performance of LFGC, we test it in simulations and with the NGSIM data, where the LFGC demonstrates a high success rate of 97.5% in merging. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 15 pages, 12 figures

arXiv:2112.07552 [pdf, other]

TCUDB: Accelerating Database with Tensor Processors

Authors: Yu-Ching Hu, Yuliang Li, Hung-Wei Tseng

Abstract: The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportunities of accelerating database systems using NVIDIA's… ▽ More The emergence of novel hardware accelerators has powered the tremendous growth of machine learning in recent years. These accelerators deliver incomparable performance gains in processing high-volume matrix operators, particularly matrix multiplication, a core component of neural network training and inference. In this work, we explored opportunities of accelerating database systems using NVIDIA's Tensor Core Units (TCUs). We present TCUDB, a TCU-accelerated query engine processing a set of query operators including natural joins and group-by aggregates as matrix operators within TCUs. Matrix multiplication was considered inefficient in the past; however, this strategy has remained largely unexplored in conventional GPU-based databases, which primarily rely on vector or scalar processing. We demonstrate the significant performance gain of TCUDB in a range of real-world applications including entity matching, graph query processing, and matrix-based data analytics. TCUDB achieves up to 288x speedup compared to a baseline GPU-based query engine. △ Less

Submitted 14 December, 2021; originally announced December 2021.

Comments: 16 pages, 14 figures, to appear in the 2022 ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2022)

arXiv:2112.02538 [pdf, ps, other]

Toward Real-World Voice Disorder Classification

Authors: Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Te Wang, Shih-Hau Fang, Yu Tsao

Abstract: Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resour… ▽ More Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. Methods: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain invariant features. Results: The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. Significance: To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources. △ Less

Submitted 26 April, 2023; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: Accepted by IEEE TBME (under an IEEE Open Access publishing Agreement)

arXiv:2111.06316 [pdf, other]

Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

Authors: Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

Abstract: This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing. The DOTN aims to estimate clean references of noisy speech in a target domain, by exploiting the knowledge available from the source domain. The domain shift between training and… ▽ More This paper presents a novel discriminator-constrained optimal transport network (DOTN) that performs unsupervised domain adaptation for speech enhancement (SE), which is an essential regression task in speech processing. The DOTN aims to estimate clean references of noisy speech in a target domain, by exploiting the knowledge available from the source domain. The domain shift between training and testing data has been reported to be an obstacle to learning problems in diverse fields. Although rich literature exists on unsupervised domain adaptation for classification, the methods proposed, especially in regressions, remain scarce and often depend on additional information regarding the input data. The proposed DOTN approach tactically fuses the optimal transport (OT) theory from mathematical analysis with generative adversarial frameworks, to help evaluate continuous labels in the target domain. The experimental results on two SE tasks demonstrate that by extending the classical OT formulation, our proposed DOTN outperforms previous adversarial domain adaptation frameworks in a purely unsupervised manner. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2109.09792 [pdf, other]

Stochastic MPC with Multi-modal Predictions for Traffic Intersections

Authors: Siddharth H. Nair, Vijay Govindarajan, Theresa Lin, Chris Meissen, H. Eric Tseng, Francesco Borrelli

Abstract: We propose a Stochastic MPC (SMPC) formulation for autonomous driving at traffic intersections which incorporates multi-modal predictions of surrounding vehicles for collision avoidance constraints. The multi-modal predictions are obtained with Gaussian Mixture Models (GMM) and constraints are formulated as chance-constraints. Our main theoretical contribution is a SMPC formulation that optimizes… ▽ More We propose a Stochastic MPC (SMPC) formulation for autonomous driving at traffic intersections which incorporates multi-modal predictions of surrounding vehicles for collision avoidance constraints. The multi-modal predictions are obtained with Gaussian Mixture Models (GMM) and constraints are formulated as chance-constraints. Our main theoretical contribution is a SMPC formulation that optimizes over a novel feedback policy class designed to exploit additional structure in the GMM predictions, and that is amenable to convex programming. The use of feedback policies for prediction is motivated by the need for reduced conservatism in handling multi-modal predictions of the surrounding vehicles, especially prevalent in traffic intersection scenarios. We evaluate our algorithm along axes of mobility, comfort, conservatism and computational efficiency at a simulated intersection in CARLA. Our simulations use a kinematic bicycle model and multimodal predictions trained on a subset of the Lyft Level 5 prediction dataset. To demonstrate the impact of optimizing over feedback policies, we compare our algorithm with two SMPC baselines that handle multi-modal collision avoidance chance constraints by optimizing over open-loop sequences. △ Less

Submitted 25 February, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Extended version of ITSC 2022 submission

arXiv:2108.08448 [pdf, other]

doi 10.1109/IROS47612.2022.9981621

Improved Robustness and Safety for Pre-Adaptation of Meta Reinforcement Learning with Prior Regularization

Authors: Lu Wen, Songan Zhang, H. Eric Tseng, Baljeet Singh, Dimitar Filev, Huei Peng

Abstract: Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. \textit{Probabilistic embeddings for actor-critic RL} (PEARL) is a leading approach for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicit… ▽ More Meta Reinforcement Learning (Meta-RL) has seen substantial advancements recently. In particular, off-policy methods were developed to improve the data efficiency of Meta-RL techniques. \textit{Probabilistic embeddings for actor-critic RL} (PEARL) is a leading approach for multi-MDP adaptation problems. A major drawback of many existing Meta-RL methods, including PEARL, is that they do not explicitly consider the safety of the prior policy when it is exposed to a new task for the first time. Safety is essential for many real-world applications, including field robots and Autonomous Vehicles (AVs). In this paper, we develop the PEARL PLUS (PEARL$^+$) algorithm, which optimizes the policy for both prior (pre-adaptation) safety and posterior (after-adaptation) performance. Building on top of PEARL, our proposed PEARL$^+$ algorithm introduces a prior regularization term in the reward function and a new Q-network for recovering the state-action value under prior context assumptions, to improve the robustness to task distribution shift and safety of the trained network exposed to a new task for the first time. The performance of PEARL$^+$ is validated by solving three safety-critical problems related to robots and AVs, including two MuJoCo benchmark problems. From the simulation experiments, we show that safety of the prior policy is significantly improved and more robust to task distribution shift compared to PEARL. △ Less

Submitted 9 February, 2023; v1 submitted 18 August, 2021; originally announced August 2021.

arXiv:2107.05473 [pdf, other]

GPTPU: Accelerating Applications using Edge Tensor Processing Units

Authors: Kuan-Chieh Hsu, Hung-Wei Tseng

Abstract: Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can i… ▽ More Neural network (NN) accelerators have been integrated into a wide-spectrum of computer systems to accommodate the rapidly growing demands for artificial intelligence (AI) and machine learning (ML) applications. NN accelerators share the idea of providing native hardware support for operations on multidimensional tensor data. Therefore, NN accelerators are theoretically tensor processors that can improve system performance for any problem that uses tensors as inputs/outputs. Unfortunately, commercially available NN accelerators only expose computation capabilities through AI/ML-specific interfaces. Furthermore, NN accelerators reveal very few hardware design details, so applications cannot easily leverage the tensor operations NN accelerators provide. This paper introduces General-Purpose Computing on Edge Tensor Processing Units (GPTPU), an open-source, open-architecture framework that allows the developer and research communities to discover opportunities that NN accelerators enable for applications. GPTPU includes a powerful programming interface with efficient runtime system-level support -- similar to that of CUDA/OpenCL in GPGPU computing -- to bridge the gap between application demands and mismatched hardware/software interfaces. We built GPTPU machine uses Edge Tensor Processing Units (Edge TPUs), which are widely available and representative of many commercial NN accelerators. We identified several novel use cases and revisited the algorithms. By leveraging the underlying Edge TPUs to perform tensor-algorithm-based compute kernels, our results reveal that GPTPU can achieve a 2.46x speedup over high-end CPUs and reduce energy consumption by 40%. △ Less

Submitted 13 July, 2021; v1 submitted 22 June, 2021; originally announced July 2021.

Comments: This paper is a pre-print of a paper in the 2021 SC, the International Conference for High Performance Computing, Networking, Storage and Analysis

arXiv:2106.06191 [pdf, other]

Structural evolution and on-demand growth of artificial synapses via field-directed polymerization

Authors: Matteo Cucchi, Hans Kleemann, Hsin Tseng, Alexander Lee, Karl Leo

Abstract: Interconnectivity, fault tolerance, and dynamic evolution of the circuitry are long sought-after objectives of bio-inspired engineering. Here, we propose dendritic transistors composed of organic semiconductors as building blocks for neuromorphic computing. These devices, owning to their voltage-triggered growth and resemblance to neural structures, respond to action potentials to achieve complex… ▽ More Interconnectivity, fault tolerance, and dynamic evolution of the circuitry are long sought-after objectives of bio-inspired engineering. Here, we propose dendritic transistors composed of organic semiconductors as building blocks for neuromorphic computing. These devices, owning to their voltage-triggered growth and resemblance to neural structures, respond to action potentials to achieve complex brain-like features, such as Pavlovian learning, pattern recognition, and spike-timing-dependent plasticity. The dynamic formation of the connections is reminiscent of a biological learning mechanism known as synaptogenesis, and it is carried out by an electrochemical reaction that we name field-directed polymerization. We employ it to dendritic connections and, by modulating the growth parameters, control material properties such as the resistance and the time constants relevant for plasticity. We believe these results will inspire further research towards the complex integration of polymerized synapses for brain-inspired computing. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.03719 [pdf, other]

Incremental False Negative Detection for Contrastive Learning

Authors: Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

Abstract: Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset. However, such instance-level learning ignores the semantic relationship among instances and sometimes undesirably repels the anchor from the semantically similar samples, termed as "false negatives". In this work, we show that… ▽ More Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset. However, such instance-level learning ignores the semantic relationship among instances and sometimes undesirably repels the anchor from the semantically similar samples, termed as "false negatives". In this work, we show that the unfavorable effect from false negatives is more significant for the large-scale datasets with more semantic concepts. To address the issue, we propose a novel self-supervised contrastive learning framework that incrementally detects and explicitly removes the false negative samples. Specifically, following the training process, our method dynamically detects increasing high-quality false negatives considering that the encoder gradually improves and the embedding space becomes more semantically structural. Next, we discuss two strategies to explicitly remove the detected false negatives during contrastive learning. Extensive experiments show that our framework outperforms other self-supervised contrastive learning methods on multiple benchmarks in a limited resource setup. △ Less

Submitted 16 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: ICLR 2022

arXiv:2105.13509 [pdf, other]

Learning to Stylize Novel Views

Authors: Hsin-** Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, Ming-Hsuan Yang

Abstract: We tackle a 3D scene stylization problem - generating stylized images of a scene from arbitrary novel views given a set of images of the same scene and a reference image of the desired style as inputs. Direct solution of combining novel view synthesis and stylization approaches lead to results that are blurry or not consistent across different views. We propose a point cloud-based method for consi… ▽ More We tackle a 3D scene stylization problem - generating stylized images of a scene from arbitrary novel views given a set of images of the same scene and a reference image of the desired style as inputs. Direct solution of combining novel view synthesis and stylization approaches lead to results that are blurry or not consistent across different views. We propose a point cloud-based method for consistent 3D scene stylization. First, we construct the point cloud by back-projecting the image features to the 3D space. Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix. Finally, we project the transformed features to 2D space to obtain the novel views. Experimental results on two diverse datasets of real-world scenes validate that our method generates consistent stylized novel view synthesis results against other alternative approaches. △ Less

Submitted 15 September, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: Project page: https://hhsin**.github.io/3d_scene_stylization/ Code: https://github.com/hhsin**/stylescene

arXiv:2105.13016 [pdf, other]

Stylizing 3D Scene via Implicit Representation and HyperNetwork

Authors: Pei-Ze Chiang, Meng-Shiun Tsai, Hung-Yu Tseng, Wei-sheng Lai, Wei-Chen Chiu

Abstract: In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high-quality results of the neural radiance fields (NeRF) method, w… ▽ More In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high-quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance fields model, and a hypernetwork to transfer the style information into the scene representation. In particular, our implicit representation model disentangles the scene into the geometry and appearance branches, and the hypernetwork learns to predict the parameters of the appearance branch from the reference style image. To alleviate the training difficulties and memory burden, we propose a two-stage training procedure and a patch sub-sampling approach to optimize the style and content losses with the neural radiance fields model. After optimization, our model is able to render consistent novel views at arbitrary view angles with arbitrary style. Both quantitative evaluation and human subject study have demonstrated that the proposed method generates faithful stylization results with consistent appearance across different views. △ Less

Submitted 16 January, 2022; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted to WACV2022; Project page: https://ztex08010518.github.io/3dstyletransfer/

Showing 1–50 of 77 results for author: Tseng, H