Search | arXiv e-print repository

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Ya**g Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2403.08247 [pdf, other]

A Dual-domain Regularization Method for Ring Artifact Removal of X-ray CT

Authors: Hongyang Zhu, Xin Lu, Yanwei Qin, Xinran Yu, Tianjiao Sun, Yunsong Zhao

Abstract: Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on th… ▽ More Ring artifacts in computed tomography images, arising from the undesirable responses of detector units, significantly degrade image quality and diagnostic reliability. To address this challenge, we propose a dual-domain regularization model to effectively remove ring artifacts, while maintaining the integrity of the original CT image. The proposed model corrects the vertical stripe artifacts on the sinogram by innovatively updating the response inconsistency compensation coefficients of detector units, which is achieved by employing the group sparse constraint and the projection-view direction sparse constraint on the stripe artifacts. Simultaneously, we apply the sparse constraint on the reconstructed image to further rectified ring artifacts in the image domain. The key advantage of the proposed method lies in considering the relationship between the response inconsistency compensation coefficients of the detector units and the projection views, which enables a more accurate correction of the response of the detector units. An alternating minimization method is designed to solve the model. Comparative experiments on real photon counting detector data demonstrate that the proposed method not only surpasses existing methods in removing ring artifacts but also excels in preserving structural details and image fidelity. △ Less

Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2312.13752 [pdf]

Hunting imaging biomarkers in pulmonary fibrosis: Benchmarks of the AIIB23 challenge

Authors: Yang Nan, Xiaodan Xing, Shiyi Wang, Zeyu Tang, Federico N Felder, Sheng Zhang, Roberta Eufrasia Ledda, Xiaoliu Ding, Ruiqi Yu, Wei** Liu, Feng Shi, Tianyang Sun, Zehong Cao, Minghui Zhang, Yun Gu, Hanxiao Zhang, Jian Gao, **yu Wang, Wen Tang, Pengxin Yu, Han Kang, Junqiang Chen, Xing Lu, Boyu Zhang, Michail Mamalakis , et al. (16 additional authors not shown)

Abstract: Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intric… ▽ More Airway-related quantitative imaging biomarkers are crucial for examination, diagnosis, and prognosis in pulmonary diseases. However, the manual delineation of airway trees remains prohibitively time-consuming. While significant efforts have been made towards enhancing airway modelling, current public-available datasets concentrate on lung diseases with moderate morphological variations. The intricate honeycombing patterns present in the lung tissues of fibrotic lung disease patients exacerbate the challenges, often leading to various prediction errors. To address this issue, the 'Airway-Informed Quantitative CT Imaging Biomarker for Fibrotic Lung Disease 2023' (AIIB23) competition was organized in conjunction with the official 2023 International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). The airway structures were meticulously annotated by three experienced radiologists. Competitors were encouraged to develop automatic airway segmentation models with high robustness and generalization abilities, followed by exploring the most correlated QIB of mortality prediction. A training set of 120 high-resolution computerised tomography (HRCT) scans were publicly released with expert annotations and mortality status. The online validation set incorporated 52 HRCT scans from patients with fibrotic lung disease and the offline test set included 140 cases from fibrosis and COVID-19 patients. The results have shown that the capacity of extracting airway trees from patients with fibrotic lung disease could be enhanced by introducing voxel-wise weighted general union loss and continuity loss. In addition to the competitive image biomarkers for prognosis, a strong airway-derived biomarker (Hazard ratio>1.5, p<0.0001) was revealed for survival prognostication compared with existing clinical measurements, clinician assessment and AI-based biomarkers. △ Less

Submitted 16 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 19 pages

arXiv:2306.09164 [pdf]

Network Architecture Design toward Convergence of Mobile Applications and Networks

Authors: Shuangfeng Han, Zhiming Liu, Tao Sun, Xiaoyun Wang

Abstract: With the quick proliferation of extended reality (XR) services, the mobile communications networks are faced with gigantic challenges to meet the diversified and challenging service requirements. A tight coordination or even convergence of applications and mobile networks is highly motivated. In this paper, a multi-domain (e.g. application layer, transport layer, the core network, radio access net… ▽ More With the quick proliferation of extended reality (XR) services, the mobile communications networks are faced with gigantic challenges to meet the diversified and challenging service requirements. A tight coordination or even convergence of applications and mobile networks is highly motivated. In this paper, a multi-domain (e.g. application layer, transport layer, the core network, radio access network, user equipment) coordination scheme is first proposed, which facilitates a tight coordination between applications and networks based on the current 5G networks. Toward the convergence of applications and networks, a network architectures with cross-domain joint processing capability is further proposed for 6G mobile communications and beyond. Both designs are able to provide more accurate information of the quality of experience (QoE) and quality of service (QoS), thus paving the path for the joint optimization of applications and networks. The benefits of the QoE assisted scheduling are further investigated via simulations. A new QoE-oriented fairness metric is further proposed, which is capable of ensuring better fairness when different services are scheduled. Future research directions and their standardization impacts are also identified. Toward optimized end-to-end service provision, the paradigm shift from loosely coupled to converged design of applications and wireless communication networks is indispensable. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 7 pages, 5 figures, IEEE communications magazine, under review

arXiv:2210.06368 [pdf, other]

Individualized Conditioning and Negative Distances for Speaker Separation

Authors: Tao Sun, Nidal Abuhajar, Shuyu Gong, Zhewei Wang, Charles D. Smith, Xianhui Wang, Li Xu, Jundong Liu

Abstract: Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs… ▽ More Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to ICMLA 2022

arXiv:2204.14057 [pdf, other]

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast

Authors: Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng

Abstract: We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based… ▽ More We present an approach to learn voice-face representations from the talking face videos, without any identity labels. Previous works employ cross-modal instance discrimination tasks to establish the correlation of voice and face. These methods neglect the semantic content of different videos, introducing false-negative pairs as training noise. Furthermore, the positive pairs are constructed based on the natural correlation between audio clips and visual frames. However, this correlation might be weak or inaccurate in a large amount of real-world data, which leads to deviating positives into the contrastive paradigm. To address these issues, we propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives. On one hand, CMPC could learn the intra-class invariance by constructing semantic-wise positives via unsupervised clustering in different modalities. On the other hand, by comparing the similarities of cross-modal instances from that of cross-modal prototypes, we dynamically recalibrate the unlearnable instances' contribution to overall loss. Experiments show that the proposed approach outperforms state-of-the-art unsupervised methods on various voice-face association evaluation protocols. Additionally, in the low-shot supervision setting, our method also has a significant improvement compared to previous instance-wise contrastive learning. △ Less

Submitted 26 May, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: 8 pages, 4 figures. Accepted by IJCAI-2022

arXiv:2109.13521 [pdf, other]

A multi-stage semi-supervised improved deep embedded clustering method for bearing fault diagnosis under the situation of insufficient labeled samples

Authors: Tongda Sun, Gang Yu

Abstract: Although data-driven fault diagnosis methods have been widely applied, massive labeled data are required for model training. However, a difficulty of implementing this in real industries hinders the application of these methods. Hence, an effective diagnostic approach that can work well in such situation is urgently needed.In this study, a multi-stage semi-supervised improved deep embedded cluster… ▽ More Although data-driven fault diagnosis methods have been widely applied, massive labeled data are required for model training. However, a difficulty of implementing this in real industries hinders the application of these methods. Hence, an effective diagnostic approach that can work well in such situation is urgently needed.In this study, a multi-stage semi-supervised improved deep embedded clustering (MS-SSIDEC) method, which combines semi-supervised learning with improved deep embedded clustering (IDEC), is proposed to jointly explore scarce labeled data and massive unlabeled data. In the first stage, a skip-connection-based convolutional auto-encoder (SCCAE) that can automatically map the unlabeled data into a low-dimensional feature space is proposed and pre-trained to be a fault feature extractor. In the second stage, a semi-supervised improved deep embedded clustering (SSIDEC) network is proposed for clustering. It is first initialized with available labeled data and then used to simultaneously optimize the clustering label assignment and make the feature space to be more clustering-friendly. To tackle the phenomenon of overfitting, virtual adversarial training (VAT) is introduced as a regularization term in this stage. In the third stage, pseudo labels are obtained by the high-quality results of SSIDEC. The labeled dataset can be augmented by these pseudo-labeled data and then leveraged to train a bearing fault diagnosis model. Two public datasets of vibration data from rolling bearings are used to evaluate the performance of the proposed method. Experimental results indicate that the proposed method achieves a promising performance in both semi-supervised and unsupervised fault diagnosis tasks. This method provides a new approach for fault diagnosis under the situation of limited labeled samples by effectively exploring unsupervised data. △ Less

Submitted 23 November, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: 24 pages, 15 figures and 59 references

arXiv:2107.01762 [pdf]

Energy Management Strategy for Unmanned Tracked Vehicles Based on Local Speed Planning

Authors: Tianxing Sun, Shaohang Xu, Zirui Li, Yingqi Tan, Huiyan Chen

Abstract: The hybrid electric system has good potential for unmanned tracked vehicles due to its excellent power and economy. Due to unmanned tracked vehicles have no traditional driving devices, and the driving cycle is uncertain, it brings new challenges to conventional energy management strategies. This paper proposes a novel energy management strategy for unmanned tracked vehicles based on local speed p… ▽ More The hybrid electric system has good potential for unmanned tracked vehicles due to its excellent power and economy. Due to unmanned tracked vehicles have no traditional driving devices, and the driving cycle is uncertain, it brings new challenges to conventional energy management strategies. This paper proposes a novel energy management strategy for unmanned tracked vehicles based on local speed planning. The contributions are threefold. Firstly, a local speed planning algorithm is adopted for the input of driving cycle prediction to avoid the dependence of traditional vehicles on driver's operation. Secondly, a prediction model based on Convolutional Neural Networks and Long Short-Term Memory (CNN-LSTM) is proposed, which is used to process both the planned and the historical velocity series to improve the prediction accuracy. Finally, based on the prediction results, the model predictive control algorithm is used to realize the real-time optimization of energy management. The validity of the method is verified by simulation using collected data from actual field experiments of our unmanned tracked vehicle. Compared with multi-step neural networks, the prediction model based on CNN-LSTM improves the prediction accuracy by 20%. Compared with the traditional regular energy management strategy, the energy management strategy based on model predictive control reduces fuel consumption by 7%. △ Less

Submitted 4 July, 2021; originally announced July 2021.

arXiv:2106.15410 [pdf]

doi 10.1016/j.optcom.2021.127474

Improvements in Micro-CT Method for Characterizing X-ray Monocapillary Optics

Authors: Zhao Wang, Kai Pan, Shuang Zhang, Zhuxuan Duo, Zhiguo Liu, Tianxi Sun

Abstract: Accurate characterization of the inner surface of X-ray monocapillary optics (XMCO) is of great significance in X-ray optics research. Compared with other characterization methods, the micro computed tomography (micro-CT) method has its unique advantages but also has some disadvantages, such as a long scanning time, long image reconstruction time, and inconvenient scanning process. In this paper,… ▽ More Accurate characterization of the inner surface of X-ray monocapillary optics (XMCO) is of great significance in X-ray optics research. Compared with other characterization methods, the micro computed tomography (micro-CT) method has its unique advantages but also has some disadvantages, such as a long scanning time, long image reconstruction time, and inconvenient scanning process. In this paper, sparse sampling was proposed to shorten the scanning time, GPU acceleration technology was used to improve the speed of image reconstruction, and a simple geometric calibration algorithm was proposed to avoid the calibration phantom and simplify the scanning process. These methodologies will popularize the use of the micro-CT method in XMCO characterization. △ Less

Submitted 15 September, 2021; v1 submitted 28 June, 2021; originally announced June 2021.

arXiv:2008.10239 [pdf, other]

Managing connected and automated vehicles with flexible routing at "lane-allocation-free'' intersections

Authors: Wan**g Ma, Ruochen Hao, Chunhui Yu, Tuo Sun, Bart van Arem

Abstract: Trajectory planning and coordination for connected and automated vehicles (CAVs) have been studied at isolated ``signal-free'' intersections and in ``signal-free'' corridors under the fully CAV environment in the literature. Most of the existing studies are based on the definition of approaching and exit lanes. The route a vehicle takes to pass through an intersection is determined from its moveme… ▽ More Trajectory planning and coordination for connected and automated vehicles (CAVs) have been studied at isolated ``signal-free'' intersections and in ``signal-free'' corridors under the fully CAV environment in the literature. Most of the existing studies are based on the definition of approaching and exit lanes. The route a vehicle takes to pass through an intersection is determined from its movement. That is, only the origin and destination arms are included. This study proposes a mixed-integer linear programming (MILP) model to optimize vehicle trajectories at an isolated ``signal-free'' intersection without lane allocation, which is denoted as ``lane-allocation-free'' (LAF) control. Each lane can be used as both approaching and exit lanes for all vehicle movements including left-turn, through, and right-turn. A vehicle can take a flexible route by way of multiple arms to pass through the intersection. In this way, the spatial-temporal resources are expected to be fully utilized. The interactions between vehicle trajectories are modeled explicitly at the microscopic level. Vehicle routes and trajectories (i.e., car-following and lane-changing behaviors) at the intersection are optimized in one unified framework for system optimality in terms of total vehicle delay. Considering varying traffic conditions, the planning horizon is adaptively adjusted in the implementation procedure of the proposed model to make a balance between solution feasibility and computational burden. Numerical studies validate the advantages of the proposed LAF control in terms of both vehicle delay and throughput with different demand structures and temporal safety gaps. △ Less

Submitted 24 August, 2020; originally announced August 2020.

Comments: 31 pages, 5 figures, for simulation video, see https://magic.tongji.edu.cn/en/index.php?catid=41

arXiv:2008.06988 [pdf]

Power and the Pandemic: Exploring Global Changes in Electricity Demand During COVID-19

Authors: Elizabeth Buechler, Siobhan Powell, Tao Sun, Chad Zanocco, Nicolas Astier, Jose Bolorinos, June Flora, Hilary Boudet, Ram Rajagopal

Abstract: Understanding how efforts to limit exposure to COVID-19 have altered electricity demand provides insights not only into how dramatic restrictions shape electricity demand but also about future electricity use in a post-COVID-19 world. We develop a unified modeling framework to quantify and compare electricity usage changes in 58 countries and regions around the world from January-May 2020. We find… ▽ More Understanding how efforts to limit exposure to COVID-19 have altered electricity demand provides insights not only into how dramatic restrictions shape electricity demand but also about future electricity use in a post-COVID-19 world. We develop a unified modeling framework to quantify and compare electricity usage changes in 58 countries and regions around the world from January-May 2020. We find that daily electricity demand declined as much as 10% in April 2020 compared to modelled demand, controlling for weather, seasonal and temporal effects, but with significant variation. Clustering techniques show that four impact groups capture systematic differences in timing and depth of electricity usage changes, ranging from a mild decline of 2% to an extreme decline of 26%. These grou**s do not align with geography, with almost every continent having at least one country or region that experienced a dramatic reduction in demand and one that did not. Instead, we find that such changes relate to government restrictions and mobility. Government restrictions have a non-linear effect on demand that generally saturates at its most restrictive levels and sustains even as restrictions ease. Mobility offers a sharper focus on electricity demand change with workplace and residential mobility strongly linked to demand changes at the daily level. Steep declines in electricity usage are associated with workday hourly load patterns that resemble pre-COVID weekend usage. Quantifying these impacts is a crucial first step in understanding the impacts of crises like the pandemic and the associated societal response on electricity demand. △ Less

Submitted 16 August, 2020; originally announced August 2020.

arXiv:2001.00605 [pdf, other]

Zero-Shot Reinforcement Learning with Deep Attention Convolutional Neural Networks

Authors: Sahika Genc, Sunil Mallya, Sravan Bodapati, Tao Sun, Yunzhe Tao

Abstract: Simulation-to-simulation and simulation-to-real world transfer of neural network models have been a difficult problem. To close the reality gap, prior methods to simulation-to-real world transfer focused on domain adaptation, decoupling perception and dynamics and solving each problem separately, and randomization of agent parameters and environment conditions to expose the learning agent to a var… ▽ More Simulation-to-simulation and simulation-to-real world transfer of neural network models have been a difficult problem. To close the reality gap, prior methods to simulation-to-real world transfer focused on domain adaptation, decoupling perception and dynamics and solving each problem separately, and randomization of agent parameters and environment conditions to expose the learning agent to a variety of conditions. While these methods provide acceptable performance, the computational complexity required to capture a large variation of parameters for comprehensive scenarios on a given task such as autonomous driving or robotic manipulation is high. Our key contribution is to theoretically prove and empirically demonstrate that a deep attention convolutional neural network (DACNN) with specific visual sensor configuration performs as well as training on a dataset with high domain and parameter variation at lower computational complexity. Specifically, the attention network weights are learned through policy optimization to focus on local dependencies that lead to optimal actions, and does not require tuning in real-world for generalization. Our new architecture adapts perception with respect to the control objective, resulting in zero-shot learning without pre-training a perception network. To measure the impact of our new deep network architecture on domain adaptation, we consider autonomous driving as a use case. We perform an extensive set of experiments in simulation-to-simulation and simulation-to-real scenarios to compare our approach to several baselines including the current state-of-art models. △ Less

Submitted 2 January, 2020; originally announced January 2020.

arXiv:1910.05253 [pdf, other]

Adversarial Colorization Of Icons Based On Structure And Color Conditions

Authors: Tsai-Ho Sun, Chien-Hsun Lai, Sai-Keung Wong, Yu-Shuen Wang

Abstract: We present a system to help designers create icons that are widely used in banners, signboards, billboards, homepages, and mobile apps. Designers are tasked with drawing contours, whereas our system colorizes contours in different styles. This goal is achieved by training a dual conditional generative adversarial network (GAN) on our collected icon dataset. One condition requires the generated ima… ▽ More We present a system to help designers create icons that are widely used in banners, signboards, billboards, homepages, and mobile apps. Designers are tasked with drawing contours, whereas our system colorizes contours in different styles. This goal is achieved by training a dual conditional generative adversarial network (GAN) on our collected icon dataset. One condition requires the generated image and the drawn contour to possess a similar contour, while the other anticipates the image and the referenced icon to be similar in color style. Accordingly, the generator takes a contour image and a man-made icon image to colorize the contour, and then the discriminators determine whether the result fulfills the two conditions. The trained network is able to colorize icons demanded by designers and greatly reduces their workload. For the evaluation, we compared our dual conditional GAN to several state-of-the-art techniques. Experiment results demonstrate that our network is over the previous networks. Finally, we will provide the source code, icon dataset, and trained network for public use. △ Less

Submitted 3 October, 2019; originally announced October 2019.

arXiv:1907.12945 [pdf, other]

doi 10.1109/TIP.2019.2924339

Inertial nonconvex alternating minimizations for the image deblurring

Authors: Tao Sun, Roberto Barrio, Marcos Rodriguez, Hao Jiang

Abstract: In image processing, Total Variation (TV) regularization models are commonly used to recover blurred images. One of the most efficient and popular methods to solve the convex TV problem is the Alternating Direction Method of Multipliers (ADMM) algorithm, recently extended using the inertial proximal point method. Although all the classical studies focus on only a convex formulation, recent article… ▽ More In image processing, Total Variation (TV) regularization models are commonly used to recover blurred images. One of the most efficient and popular methods to solve the convex TV problem is the Alternating Direction Method of Multipliers (ADMM) algorithm, recently extended using the inertial proximal point method. Although all the classical studies focus on only a convex formulation, recent articles are paying increasing attention to the nonconvex methodology due to its good numerical performance and properties. In this paper, we propose to extend the classical formulation with a novel nonconvex Alternating Direction Method of Multipliers with the Inertial technique (IADMM). Under certain assumptions on the parameters, we prove the convergence of the algorithm with the help of the Kurdyka-Łojasiewicz property. We also present numerical simulations on classical TV image reconstruction problems to illustrate the efficiency of the new algorithm and its behavior compared with the well established ADMM method. △ Less

Submitted 26 July, 2019; originally announced July 2019.

Comments: Transactions on Image Processing

arXiv:1907.11956 [pdf, other]

Dilated FCN: Listening Longer to Hear Better

Authors: Shuyu Gong, Zhewei Wang, Tao Sun, Yuanhang Zhang, Charles D. Smith, Li Xu, Jundong Liu

Abstract: Deep neural network solutions have emerged as a new and powerful paradigm for speech enhancement (SE). The capabilities to capture long context and extract multi-scale patterns are crucial to design effective SE networks. Such capabilities, however, are often in conflict with the goal of maintaining compact networks to ensure good system generalization. In this paper, we explore dilation operation… ▽ More Deep neural network solutions have emerged as a new and powerful paradigm for speech enhancement (SE). The capabilities to capture long context and extract multi-scale patterns are crucial to design effective SE networks. Such capabilities, however, are often in conflict with the goal of maintaining compact networks to ensure good system generalization. In this paper, we explore dilation operations and apply them to fully convolutional networks (FCNs) to address this issue. Dilations equip the networks with greatly expanded receptive fields, without increasing the number of parameters. Different strategies to fuse multi-scale dilations, as well as to install the dilation modules are explored in this work. Using Noisy VCTK and AzBio sentences datasets, we demonstrate that the proposed dilation models significantly improve over the baseline FCN and outperform the state-of-the-art SE solutions. △ Less

Submitted 27 July, 2019; originally announced July 2019.

Comments: 5 pages; will appear in WASPAA conference

arXiv:1907.04536 [pdf]

Multi-layer Attention Mechanism for Speech Keyword Recognition

Authors: Ruisen Luo, Tianran Sun, Chen Wang, Miao Du, Zuodong Tang, Kai Zhou, Xiaofeng Gong, Xiaomei Yang

Abstract: As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition… ▽ More As an important part of speech recognition technology, automatic speech keyword recognition has been intensively studied in recent years. Such technology becomes especially pivotal under situations with limited infrastructures and computational resources, such as voice command recognition in vehicles and robot interaction. At present, the mainstream methods in automatic speech keyword recognition are based on long short-term memory (LSTM) networks with attention mechanism. However, due to inevitable information losses for the LSTM layer caused during feature extraction, the calculated attention weights are biased. In this paper, a novel approach, namely Multi-layer Attention Mechanism, is proposed to handle the inaccurate attention weights problem. The key idea is that, in addition to the conventional attention mechanism, information of layers prior to feature extraction and LSTM are introduced into attention weights calculations. Therefore, the attention weights are more accurate because the overall model can have more precise and focused areas. We conduct a comprehensive comparison and analysis on the keyword spotting performances on convolution neural network, bi-directional LSTM cyclic neural network, and cyclic neural network with the proposed attention mechanism on Google Speech Command datasets V2 datasets. Experimental results indicate favorable results for the proposed method and demonstrate the validity of the proposed method. The proposed multi-layer attention methods can be useful for other researches related to object spotting. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1906.00732 [pdf, other]

Cloud Storage for Multi-Service Battery Operation (Extended Version)

Authors: Mohammad Rasouli, Tao Sun, Camille Pache, Patrick Panciatici, Jean Maeght, Ramesh Johari, Ram Rajagopal

Abstract: We study a cloud storage operator who provides shared storage service for electricity end-users using the residual part of a multi-service grid-scale battery primarily used for high priority grid services. We design an optimal product offering, pricing and customer portfolio. A framework and solution approach for assessing and operating such multi-service battery operations with stochastic service… ▽ More We study a cloud storage operator who provides shared storage service for electricity end-users using the residual part of a multi-service grid-scale battery primarily used for high priority grid services. We design an optimal product offering, pricing and customer portfolio. A framework and solution approach for assessing and operating such multi-service battery operations with stochastic services and different priority levels is an open problem is proposed. The methodology consists in modelling the problem as a two-stage stochastic optimization between high priority stochastic grid services and low priority cloud storage for stochastic end users. We also propose the operational metrics of multiplexing gain and probability of blocking to assess the operation of multi-service multi-user battery. To address the computational challenge of solving the stochastic optimization with a large number of end-users, we propose effective capacity as a convex approximation that allows an analytical solution. We then provide an empirical analysis based on real grid congestion data from RTE France, and a large dataset of end-users' electricity consumption in California. Our empirical analysis shows (i) our proposed effective capacity is a close approximation, (ii) battery operation and profit are sensitive to the cost of external resources, number of end-users, and RTE's leasing price of the battery, and (iii) with only a slight discount of the leasing price, the profit of the third party from a stochastic residual battery can be the same as that of a deterministic one. Cloud storage as a low priority service can profitably exist alongside other high priority battery services, making integration of more storage in the grid economically viable, and allowing larger intermittent renewables, a key path towards reduced carbon emissions. △ Less

Submitted 13 August, 2021; v1 submitted 17 May, 2019; originally announced June 2019.

arXiv:1905.00824 [pdf, other]

doi 10.1145/3306346.3323008

Single Image Portrait Relighting

Authors: Tiancheng Sun, Jonathan T. Barron, Yun-Ta Tsai, Zexiang Xu, Xueming Yu, Graham Fyffe, Christoph Rhemann, Jay Busch, Paul Debevec, Ravi Ramamoorthi

Abstract: Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, thei… ▽ More Lighting plays a central role in conveying the essence and depth of the subject in a portrait photograph. Professional photographers will carefully control the lighting in their studio to manipulate the appearance of their subject, while consumer photographers are usually constrained to the illumination of their environment. Though prior works have explored techniques for relighting an image, their utility is usually limited due to requirements of specialized hardware, multiple images of the subject under controlled or known illuminations, or accurate models of geometry and reflectance. To this end, we present a system for portrait relighting: a neural network that takes as input a single RGB image of a portrait taken with a standard cellphone camera in an unconstrained environment, and from that image produces a relit image of that subject as though it were illuminated according to any provided environment map. Our method is trained on a small database of 18 individuals captured under different directional light sources in a controlled light stage setup consisting of a densely sampled sphere of lights. Our proposed technique produces quantitatively superior results on our dataset's validation set compared to prior works, and produces convincing qualitative relighting results on a dataset of hundreds of real-world cellphone portraits. Because our technique can produce a 640 $\times$ 640 image in only 160 milliseconds, it may enable interactive user-facing photographic applications in the future. △ Less

Submitted 2 May, 2019; originally announced May 2019.

Comments: SIGGRAPH 2019 Technical Paper accepted

Journal ref: ACM Transactions on Graphics (SIGGRAPH 2019) 38 (4)

arXiv:1902.04062 [pdf, other]

Iteratively reweighted penalty alternating minimization methods with continuation for image deblurring

Authors: Tao Sun, Dongsheng Li, Hao Jiang, Zhe Quan

Abstract: In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing. We solve this problem by the penalty method and propose the iteratively reweighted alternating minimization algorithm. To speed up the algorithm, we also apply the continuation strategy to the penalty parameter. A convergence result is proved for the algorithm. Com… ▽ More In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing. We solve this problem by the penalty method and propose the iteratively reweighted alternating minimization algorithm. To speed up the algorithm, we also apply the continuation strategy to the penalty parameter. A convergence result is proved for the algorithm. Compared with the nonconvex ADMM, the proposed algorithm enjoys both theoretical and computational advantages like weaker convergence requirements and faster speed. Numerical results demonstrate the efficiency of the proposed algorithm. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Showing 1–20 of 20 results for author: Sun, T