-
Prediction of Cellular Identities from Trajectory and Cell Fate Information
Authors:
Baiyang Dai,
Jiamin Yang,
Hari Shroff,
Patrick La Riviere
Abstract:
Determining cell identities in imaging sequences is an important yet challenging task. The conventional method for cell identification is via cell tracking, which is complex and can be time-consuming. In this study, we propose an innovative approach to cell identification during early $\textit{C. elegans}$ embryogenesis using machine learning. Cell identification during $\textit{C. elegans}$ embry…
▽ More
Determining cell identities in imaging sequences is an important yet challenging task. The conventional method for cell identification is via cell tracking, which is complex and can be time-consuming. In this study, we propose an innovative approach to cell identification during early $\textit{C. elegans}$ embryogenesis using machine learning. Cell identification during $\textit{C. elegans}$ embryogenesis would provide insights into neural development with implications for higher organisms including humans. We employed random forest, MLP, and LSTM models, and tested cell classification accuracy on 3D time-lapse confocal datasets spanning the first 4 hours of embryogenesis. By leveraging a small number of spatial-temporal features of individual cells, including cell trajectory and cell fate information, our models achieve an accuracy of over 91%, even with limited data. We also determine the most important feature contributions and can interpret these features in the context of biological knowledge. Our research demonstrates the success of predicting cell identities in time-lapse imaging sequences directly from simple spatio-temporal features.
△ Less
Submitted 2 March, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
CheapNET: Improving Light-weight speech enhancement network by projected loss function
Authors:
Kaijun Tan,
Benzhe Dai,
Jiakui Li,
Wenyu Mao
Abstract:
Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach h…
▽ More
Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach has limitations. We introduce a novel projection loss function, diverging from MSE, to enhance noise suppression. This method uses projection techniques to isolate key audio components from noise, significantly improving model performance. For echo cancellation, the function enables direct predictions on LAEC pre-processed outputs, substantially enhancing performance. Our noise suppression model achieves near state-of-the-art results with only 3.1M parameters and 0.4GFlops/s computational load. Moreover, our echo cancellation model outperforms replicated industry-leading models, introducing a new perspective in speech enhancement.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Learning a Better Control Barrier Function Under Uncertain Dynamics
Authors:
Bolun Dai,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
Using control barrier functions (CBFs) as safety filters provides a computationally inexpensive yet effective method for constructing controllers in safety-critical applications. However, using CBFs requires the construction of a valid CBF, which is well known to be a challenging task, and accurate system dynamics, which are often unavailable. This paper presents a learning-based approach to learn…
▽ More
Using control barrier functions (CBFs) as safety filters provides a computationally inexpensive yet effective method for constructing controllers in safety-critical applications. However, using CBFs requires the construction of a valid CBF, which is well known to be a challenging task, and accurate system dynamics, which are often unavailable. This paper presents a learning-based approach to learn a valid CBF and the system dynamics starting from a conservative handcrafted CBF (HCBF) and the nominal system dynamics. We devise new loss functions that better suit the CBF refinement pipeline and are able to produce well-behaved CBFs with the usage of distance functions. By adopting an episodic learning approach, our proposed method is able to learn the system dynamics while not requiring additional interactions with the environment. Additionally, we provide a theoretical analysis of the quality of the learned system dynamics. We show that our proposed learning approach can effectively learn a valid CBF and an estimation of the actual system dynamics. The effectiveness of our proposed method is empirically demonstrated through simulation studies on three systems, a double integrator, a unicycle, and a two-link arm.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Smart filter aided domain adversarial neural network for fault diagnosis in noisy industrial scenarios
Authors:
Baorui Dai,
Gaëtan Frusque,
Tianfu Li,
Qi Li,
Olga Fink
Abstract:
The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplif…
▽ More
The application of unsupervised domain adaptation (UDA)-based fault diagnosis methods has shown significant efficacy in industrial settings, facilitating the transfer of operational experience and fault signatures between different operating conditions, different units of a fleet or between simulated and real data. However, in real industrial scenarios, unknown levels and types of noise can amplify the difficulty of domain alignment, thus severely affecting the diagnostic performance of deep learning models. To address this issue, we propose an UDA method called Smart Filter-Aided Domain Adversarial Neural Network (SFDANN) for fault diagnosis in noisy industrial scenarios. The proposed methodology comprises two steps. In the first step, we develop a smart filter that dynamically enforces similarity between the source and target domain data in the time-frequency domain. This is achieved by combining a learnable wavelet packet transform network (LWPT) and a traditional wavelet packet transform module. In the second step, we input the data reconstructed by the smart filter into a domain adversarial neural network (DANN). To learn domain-invariant and discriminative features, the learnable modules of SFDANN are trained in a unified manner with three objectives: time-frequency feature proximity, domain alignment, and fault classification. We validate the effectiveness of the proposed SFDANN method based on two fault diagnosis cases: one involving fault diagnosis of bearings in noisy environments and another involving fault diagnosis of slab tracks in a train-track-bridge coupling vibration system, where the transfer task involves transferring from numerical simulations to field measurements. Results show that compared to other representative state of the art UDA methods, SFDANN exhibits superior performance and remarkable stability.
△ Less
Submitted 28 September, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
State Constrained Stochastic Optimal Control for Continuous and Hybrid Dynamical Systems Using DFBSDE
Authors:
Bolun Dai,
Prashanth Krishnamurthy,
Andrew Papanicolaou,
Farshad Khorrami
Abstract:
We develop a computationally efficient learning-based forward-backward stochastic differential equations (FBSDE) controller for both continuous and hybrid dynamical (HD) systems subject to stochastic noise and state constraints. Solutions to stochastic optimal control (SOC) problems satisfy the Hamilton-Jacobi-Bellman (HJB) equation. Using current FBSDE-based solutions, the optimal control can be…
▽ More
We develop a computationally efficient learning-based forward-backward stochastic differential equations (FBSDE) controller for both continuous and hybrid dynamical (HD) systems subject to stochastic noise and state constraints. Solutions to stochastic optimal control (SOC) problems satisfy the Hamilton-Jacobi-Bellman (HJB) equation. Using current FBSDE-based solutions, the optimal control can be obtained from the HJB equations using deep neural networks (e.g., long short-term memory (LSTM) networks). To ensure the learned controller respects the constraint boundaries, we enforce the state constraints using a soft penalty function. In addition to previous works, we adapt the deep FBSDE (DFBSDE) control framework to handle HD systems consisting of continuous dynamics and a deterministic discrete state change. We demonstrate our proposed algorithm in simulation on a continuous nonlinear system (cart-pole) and a hybrid nonlinear system (five-link biped).
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Data-Efficient Control Barrier Function Refinement
Authors:
Bolun Dai,
Heming Huang,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
Control barrier functions (CBFs) have been widely used for synthesizing controllers in safety-critical applications. When used as a safety filter, it provides a simple and computationally efficient way to obtain safe controls from a possibly unsafe performance controller. Despite its conceptual simplicity, constructing a valid CBF is well known to be challenging, especially for high-relative degre…
▽ More
Control barrier functions (CBFs) have been widely used for synthesizing controllers in safety-critical applications. When used as a safety filter, it provides a simple and computationally efficient way to obtain safe controls from a possibly unsafe performance controller. Despite its conceptual simplicity, constructing a valid CBF is well known to be challenging, especially for high-relative degree systems under nonconvex constraints. Recently, work has been done to learn a valid CBF from data based on a handcrafted CBF (HCBF). Even though the HCBF gives a good initialization point, it still requires a large amount of data to train the CBF network. In this work, we propose a new method to learn more efficiently from the collected data through a novel prioritized data sampling strategy. A priority score is computed from the loss value of each data point. Then, a probability distribution based on the priority score of the data points is used to sample data and update the learned CBF. Using our proposed approach, we can learn a valid CBF that recovers a larger portion of the true safe set using a smaller amount of data. The effectiveness of our method is demonstrated in simulation on a unicycle and a two-link arm.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production
Authors:
Anyi Rao,
Xuekun Jiang,
Yuwei Guo,
Linning Xu,
Lei Yang,
Libiao **,
Dahua Lin,
Bo Dai
Abstract:
Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments, where the filming staff can easily test the settings of shots before the actual…
▽ More
Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments, where the filming staff can easily test the settings of shots before the actual filming. VDS runs on a "propose-simulate-discriminate" mode: Given a formatted story script and a camera script as input, it generates several character animation and camera movement proposals following predefined story and cinematic rules to allow an off-the-shelf simulation engine to render videos. To pick up the top-quality dynamic storyboard from the candidates, we equip it with a shot ranking discriminator based on shot quality criteria learned from professional manual-created data. VDS is comprehensively validated via extensive experiments and user studies, demonstrating its efficiency, effectiveness, and great potential in assisting amateur video production.
△ Less
Submitted 21 July, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Improving GANs with A Dynamic Discriminator
Authors:
Ceyuan Yang,
Yujun Shen,
Yinghao Xu,
Deli Zhao,
Bo Dai,
Bolei Zhou
Abstract:
Discriminator plays a vital role in training generative adversarial networks (GANs) via distinguishing real and synthesized samples. While the real data distribution remains the same, the synthesis distribution keeps varying because of the evolving generator, and thus effects a corresponding change to the bi-classification task for the discriminator. We argue that a discriminator with an on-the-fl…
▽ More
Discriminator plays a vital role in training generative adversarial networks (GANs) via distinguishing real and synthesized samples. While the real data distribution remains the same, the synthesis distribution keeps varying because of the evolving generator, and thus effects a corresponding change to the bi-classification task for the discriminator. We argue that a discriminator with an on-the-fly adjustment on its capacity can better accommodate such a time-varying task. A comprehensive empirical study confirms that the proposed training strategy, termed as DynamicD, improves the synthesis performance without incurring any additional computation cost or training objectives. Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator. Experiments on both 2D and 3D-aware image synthesis tasks conducted on a range of datasets substantiate the generalizability of our DynamicD as well as its substantial improvement over the baselines. Furthermore, DynamicD is synergistic to other discriminator-improving approaches (including data augmentation, regularizers, and pre-training), and brings continuous performance gain when combined for learning GANs.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Learning a Better Control Barrier Function
Authors:
Bolun Dai,
Prashanth Krishnamurthy,
Farshad Khorrami
Abstract:
Control barrier functions (CBFs) are widely used in safety-critical controllers. However, constructing a valid CBF is challenging, especially under nonlinear or non-convex constraints and for high relative degree systems. Meanwhile, finding a conservative CBF that only recovers a portion of the true safe set is usually possible. In this work, starting from a "conservative" handcrafted CBF (HCBF),…
▽ More
Control barrier functions (CBFs) are widely used in safety-critical controllers. However, constructing a valid CBF is challenging, especially under nonlinear or non-convex constraints and for high relative degree systems. Meanwhile, finding a conservative CBF that only recovers a portion of the true safe set is usually possible. In this work, starting from a "conservative" handcrafted CBF (HCBF), we develop a method to find a CBF that recovers a reasonably larger portion of the safe set. Since the learned CBF controller is not guaranteed to be safe during training iterations, we use a model predictive controller (MPC) to ensure safety during training. Using the collected trajectory data containing safe and unsafe interactions, we train a neural network to estimate the difference between the HCBF and a CBF that recovers a closer solution to the true safe set. With our proposed approach, we can generate safe controllers that are less conservative and computationally more efficient. We validate our approach on two systems: a second-order integrator and a ball-on-beam.
△ Less
Submitted 11 October, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Acceleration-guided Acoustic Signal Denoising Framework Based on Learnable Wavelet Transform Applied to Slab Track Condition Monitoring
Authors:
Baorui Dai,
Gaëtan Frusque,
Qi Li,
Olga Fink
Abstract:
Acoustic monitoring has recently shown great potential in the diagnosis of infrastructure condition. However, due to the severe noise interference in acoustic signals, meaningful features tend to be difficult to infer. It creates a considerable obstacle for an extensive application of acoustic monitoring. To tackle this problem, we propose an acceleration-guided acoustic signal denoising framework…
▽ More
Acoustic monitoring has recently shown great potential in the diagnosis of infrastructure condition. However, due to the severe noise interference in acoustic signals, meaningful features tend to be difficult to infer. It creates a considerable obstacle for an extensive application of acoustic monitoring. To tackle this problem, we propose an acceleration-guided acoustic signal denoising framework (AG-ASDF) based on learnable wavelet transform to automatically denoise the acoustic signal and extract the relevant features based on the acceleration signal. This denoising framework requires the acceleration signal only for the training stage. Therefore, only acoustic sensors (non-intrusive) need to be installed during the application phase, which is convenient and crucial for the condition monitoring of safety-critical infrastructure. A comparative study is conducted among the proposed AG-ASDF and other feature learning / extraction methods, by using a multi-class support vector machine to evaluate the detection effectiveness of slab track condition based on acoustic signals. Different healthy and unhealthy states of slab tracks are imitated with three types of slab track supporting conditions in a railway test line. The classification based on the proposed AG-ASDF features outperforms other feature extraction and learning methods with a significant accuracy improvement.
△ Less
Submitted 23 March, 2023; v1 submitted 11 May, 2022;
originally announced May 2022.
-
A Bitstream Feature Based Model for Video Decoding Energy Estimation
Authors:
Christian Herglotz,
Yongjun Wen,
Bowen Dai,
Matthias Kränzler,
André Kaup
Abstract:
In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a…
▽ More
In this paper we show that a small amount of bit stream features can be used to accurately estimate the energy consumption of state-of-the-art software and hardware accelerated decoder implementations for four different video codecs. By testing the estimation performance on HEVC, H.264, H.263, and VP9 we show that the proposed model can be used for any hybrid video codec. We test our approach on a high amount of different test sequences to prove the general validity. We show that less than 20 features are sufficient to obtain mean estimation errors that are smaller than 8%. Finally, an example will show the performance trade-offs in terms of rate, distortion, and decoding energy for all tested codecs.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Denoising ECG by Adaptive Filter with Empirical Mode Decomposition
Authors:
Bingze Dai,
Wen Bai
Abstract:
Electrocardiogram (ECG) signal is an important physiological signal which contains cardiac information and is the basis to diagnosis cardiac related diseases. In this paper, several innovative and efficient methods based on adaptive filter and empirical mode decomposition (EMD) to denoise ECG signal contaminated by various kinds of noise, including baseline wander (BW), power line interference (PL…
▽ More
Electrocardiogram (ECG) signal is an important physiological signal which contains cardiac information and is the basis to diagnosis cardiac related diseases. In this paper, several innovative and efficient methods based on adaptive filter and empirical mode decomposition (EMD) to denoise ECG signal contaminated by various kinds of noise, including baseline wander (BW), power line interference (PLI), electrode motion artifact (EM) and muscle artifact (MA), are proposed. We first present a novel method based on EMD and adaptive filter for the removal of BW and PLI in ECG signal. We then extend the method to the complex scenario where four most common noises, PLI, BW, EM and MA are present. The proposed Parallel EMD adaptive filter structure yields the best SNR improvement on the MIT-BIH arrhythmia database, corrupted by the four types of noises.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Visually Informed Binaural Audio Generation without Binaural Audios
Authors:
Xudong Xu,
Hang Zhou,
Ziwei Liu,
Bo Dai,
Xiaogang Wang,
Dahua Lin
Abstract:
Stereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised metho…
▽ More
Stereophonic audio, especially binaural audio, plays an essential role in immersive viewing environments. Recent research has explored generating visually guided stereophonic audios supervised by multi-channel audio collections. However, due to the requirement of professional recording devices, existing datasets are limited in scale and variety, which impedes the generalization of supervised methods in real-world scenarios. In this work, we propose PseudoBinaural, an effective pipeline that is free of binaural recordings. The key insight is to carefully build pseudo visual-stereo pairs with mono data for training. Specifically, we leverage spherical harmonic decomposition and head-related impulse response (HRIR) to identify the relationship between spatial locations and received binaural audios. Then in the visual modality, corresponding visual cues of the mono data are manually placed at sound source positions to form the pairs. Compared to fully-supervised paradigms, our binaural-recording-free pipeline shows great stability in cross-dataset evaluation and achieves comparable performance under subjective preference. Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
State Constrained Stochastic Optimal Control Using LSTMs
Authors:
Bolun Dai,
Prashanth Krishnamurthy,
Andrew Papanicolaou,
Farshad Khorrami
Abstract:
In this paper, we propose a new methodology for state constrained stochastic optimal control (SOC) problems. The solution is based on past work in solving SOC problems using forward-backward stochastic differential equations (FBSDE). Our approach in solving the FBSDE utilizes a deep neural network (DNN), specifically Long Short-Term Memory (LSTM) networks. LSTMs are chosen to solve the FBSDE to ad…
▽ More
In this paper, we propose a new methodology for state constrained stochastic optimal control (SOC) problems. The solution is based on past work in solving SOC problems using forward-backward stochastic differential equations (FBSDE). Our approach in solving the FBSDE utilizes a deep neural network (DNN), specifically Long Short-Term Memory (LSTM) networks. LSTMs are chosen to solve the FBSDE to address the curse of dimensionality, non-linearities, and long time horizons. In addition, the state constraints are incorporated using a hard penalty function, resulting in a controller that respects the constraint boundaries. Numerical instability that would be introduced by the penalty function is dealt with through an adaptive update scheme. The control design methodology is applicable to a large class of control problems. The performance and scalability of our proposed algorithm are demonstrated by numerical simulations.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Focal Frequency Loss for Image Reconstruction and Synthesis
Authors:
Liming Jiang,
Bo Dai,
Wayne Wu,
Chen Change Loy
Abstract:
Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency lo…
▽ More
Image reconstruction and synthesis have witnessed remarkable progress thanks to the development of generative models. Nonetheless, gaps could still exist between the real and generated images, especially in the frequency domain. In this study, we show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize by down-weighting the easy ones. This objective function is complementary to existing spatial losses, offering great impedance against the loss of important frequency information due to the inherent bias of neural networks. We demonstrate the versatility and effectiveness of focal frequency loss to improve popular models, such as VAE, pix2pix, and SPADE, in both perceptual quality and quantitative performance. We further show its potential on StyleGAN2.
△ Less
Submitted 23 August, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Active Disturbance Rejection Control Design with Suppression of Sensor Noise Effects in Application to DC-DC Buck Power Converter
Authors:
Krzysztof Łakomy,
Rafal Madonski,
Bin Dai,
Jun Yang,
Piotr Kicki,
Maral Ansari,
Shihua Li
Abstract:
The performance of active disturbance rejection control (ADRC) algorithms can be limited in practice by high-frequency measurement noise. In this work, this problem is addressed by transforming the high-gain extended state observer (ESO), which is the inherent element of ADRC, into a new cascade observer structure. Set of experiments, performed on a DC-DC buck power converter system, show that the…
▽ More
The performance of active disturbance rejection control (ADRC) algorithms can be limited in practice by high-frequency measurement noise. In this work, this problem is addressed by transforming the high-gain extended state observer (ESO), which is the inherent element of ADRC, into a new cascade observer structure. Set of experiments, performed on a DC-DC buck power converter system, show that the new cascade ESO design, compared to the conventional approach, effectively suppresses the detrimental effect of sensor noise over-amplification while increasing the estimation/control performance. The proposed design is also analyzed with a low-pass filter at the converter output, which is a common technique for reducing measurement noise in industrial applications.
△ Less
Submitted 4 February, 2021; v1 submitted 7 September, 2020;
originally announced September 2020.
-
DCAF: A Dynamic Computation Allocation Framework for Online Serving System
Authors:
Biye Jiang,
Pengye Zhang,
Rihan Chen,
Binding Dai,
Xinchen Luo,
Yin Yang,
Guan Wang,
Guorui Zhou,
Xiaoqiang Zhu,
Kun Gai
Abstract:
Modern large-scale systems such as recommender system and online advertising system are built upon computation-intensive infrastructure. The typical objective in these applications is to maximize the total revenue, e.g. GMV~(Gross Merchandise Volume), under a limited computation resource. Usually, the online serving system follows a multi-stage cascade architecture, which consists of several stage…
▽ More
Modern large-scale systems such as recommender system and online advertising system are built upon computation-intensive infrastructure. The typical objective in these applications is to maximize the total revenue, e.g. GMV~(Gross Merchandise Volume), under a limited computation resource. Usually, the online serving system follows a multi-stage cascade architecture, which consists of several stages including retrieval, pre-ranking, ranking, etc. These stages usually allocate resource manually with specific computing power budgets, which requires the serving configuration to adapt accordingly. As a result, the existing system easily falls into suboptimal solutions with respect to maximizing the total revenue. The limitation is due to the face that, although the value of traffic requests vary greatly, online serving system still spends equal computing power among them.
In this paper, we introduce a novel idea that online serving system could treat each traffic request differently and allocate "personalized" computation resource based on its value. We formulate this resource allocation problem as a knapsack problem and propose a Dynamic Computation Allocation Framework~(DCAF). Under some general assumptions, DCAF can theoretically guarantee that the system can maximize the total revenue within given computation budget. DCAF brings significant improvement and has been deployed in the display advertising system of Taobao for serving the main traffic. With DCAF, we are able to maintain the same business performance with 20\% computation resource reduction.
△ Less
Submitted 17 June, 2020;
originally announced June 2020.
-
Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation
Authors:
Xingang Pan,
Xiaohang Zhan,
Bo Dai,
Dahua Lin,
Chen Change Loy,
** Luo
Abstract:
Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a…
▽ More
Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig.1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at https://github.com/XingangPan/deep-generative-prior.
△ Less
Submitted 20 July, 2020; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Real or Not Real, that is the Question
Authors:
Yuanbo Xiangli,
Yubin Deng,
Bo Dai,
Chen Change Loy,
Dahua Lin
Abstract:
While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles. In this generalized framework, referred to as RealnessGAN, the discriminator outputs a distribution as the measure of realness. While RealnessGAN shares similar…
▽ More
While generative adversarial networks (GAN) have been widely adopted in various topics, in this paper we generalize the standard GAN to a new perspective by treating realness as a random variable that can be estimated from multiple angles. In this generalized framework, referred to as RealnessGAN, the discriminator outputs a distribution as the measure of realness. While RealnessGAN shares similar theoretical guarantees with the standard GAN, it provides more insights on adversarial learning. Compared to multiple baselines, RealnessGAN provides stronger guidance for the generator, achieving improvements on both synthetic and real-world datasets. Moreover, it enables the basic DCGAN architecture to generate realistic images at 1024*1024 resolution when trained from scratch.
△ Less
Submitted 12 February, 2020;
originally announced February 2020.
-
Recursive Visual Sound Separation Using Minus-Plus Net
Authors:
Xudong Xu,
Bo Dai,
Dahua Lin
Abstract:
Sounds provide rich semantics, complementary to visual data, for many tasks. However, in practice, sounds from multiple sources are often mixed together. In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation. MP-Net separates sounds recursively in the order of average energy, removing the separated sound from the mixture at t…
▽ More
Sounds provide rich semantics, complementary to visual data, for many tasks. However, in practice, sounds from multiple sources are often mixed together. In this paper we propose a novel framework, referred to as MinusPlus Network (MP-Net), for the task of visual sound separation. MP-Net separates sounds recursively in the order of average energy, removing the separated sound from the mixture at the end of each prediction, until the mixture becomes empty or contains only noise. In this way, MP-Net could be applied to sound mixtures with arbitrary numbers and types of sounds. Moreover, while MP-Net keeps removing sounds with large energy from the mixture, sounds with small energy could emerge and become clearer, so that the separation is more accurate. Compared to previous methods, MP-Net obtains state-of-the-art results on two large scale datasets, across mixtures with different types and numbers of sounds.
△ Less
Submitted 23 October, 2019; v1 submitted 30 August, 2019;
originally announced August 2019.
-
Lidar-based Object Classification with Explicit Occlusion Modeling
Authors:
Xiaoxiang Zhang,
Hao Fu,
Bin Dai
Abstract:
LIDAR is one of the most important sensors for Unmanned Ground Vehicles (UGV). Object detection and classification based on lidar point cloud is a key technology for UGV. In object detection and classification, the mutual occlusion between neighboring objects is an important factor affecting the accuracy. In this paper, we consider occlusion as an intrinsic property of the point cloud data. We pro…
▽ More
LIDAR is one of the most important sensors for Unmanned Ground Vehicles (UGV). Object detection and classification based on lidar point cloud is a key technology for UGV. In object detection and classification, the mutual occlusion between neighboring objects is an important factor affecting the accuracy. In this paper, we consider occlusion as an intrinsic property of the point cloud data. We propose a novel approach that explicitly model the occlusion. The occlusion property is then taken into account in the subsequent classification step. We perform experiments on the KITTI dataset. Experimental results indicate that by utilizing the occlusion property that we modeled, the classifier obtains much better performance.
△ Less
Submitted 9 July, 2019; v1 submitted 9 July, 2019;
originally announced July 2019.