Search | arXiv e-print repository

arXiv:2406.19608 [pdf, other]

Multi-service collaboration and composition of cloud manufacturing customized production based on problem decomposition

Authors: Hao Yue, Yingtao Wu, Min Wang, Hesuan Hu, Weimin Wu, Jihui Zhang

Abstract: Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which con… ▽ More Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which considers the collaboration among services with the same functionality. A mathematical evaluation formulation for the service composition and service usage scheme is constructed with the following critical indexes: completion time, cost, and number of selected services. Subsequently, a problem decomposition based genetic algorithm is designed to obtain the optimal service compositions with service usage schemes. A smart clothing customization case is illustrated so as to show the effectiveness and efficiency of the method proposed in this paper. Finally, the results of simulation experiments and comparisons show that these solutions obtained by our method are with the minimum time, a lower cost, and the fewer selected services. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 12 pages, 8 figures

ACM Class: J.0

arXiv:2406.08771 [pdf, other]

MFF-EINV2: Multi-scale Feature Fusion across Spectral-Spatial-Temporal Domains for Sound Event Localization and Detection

Authors: Da Mu, Zhicheng Zhang, Haobo Yue

Abstract: Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structu… ▽ More Sound Event Localization and Detection (SELD) involves detecting and localizing sound events using multichannel sound recordings. Previously proposed Event-Independent Network V2 (EINV2) has achieved outstanding performance on SELD. However, it still faces challenges in effectively extracting features across spectral, spatial, and temporal domains. This paper proposes a three-stage network structure named Multi-scale Feature Fusion (MFF) module to fully extract multi-scale features across spectral, spatial, and temporal domains. The MFF module utilizes parallel subnetworks architecture to generate multi-scale spectral and spatial features. The TF-Convolution Module is employed to provide multi-scale temporal features. We incorporated MFF into EINV2 and term the proposed method as MFF-EINV2. Experimental results in 2022 and 2023 DCASE challenge task3 datasets show the effectiveness of our MFF-EINV2, which achieves state-of-the-art (SOTA) performance compared to published methods. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2401.04976 [pdf, other]

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection

Authors: Haobo Yue, Zhicheng Zhang, Da Mu, Yonghao Dang, Jianqin Yin, ** Tang

Abstract: Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is used to model the frequency dependency of sound events. In this paper, we proposed the first full-dynamic method named \emph{full-frequency dynamic convo… ▽ More Recently, 2D convolution has been found unqualified in sound event detection (SED). It enforces translation equivariance on sound events along frequency axis, which is not a shift-invariant dimension. To address this issue, dynamic convolution is used to model the frequency dependency of sound events. In this paper, we proposed the first full-dynamic method named \emph{full-frequency dynamic convolution} (FFDConv). FFDConv generates frequency kernels for every frequency band, which is designed directly in the structure for frequency-dependent modeling. It physically furnished 2D convolution with the capability of frequency-dependent modeling. FFDConv outperforms not only the baseline by 6.6\% in DESED real validation dataset in terms of PSDS1, but outperforms the other full-dynamic methods. In addition, by visualizing features of sound events, we observed that FFDConv could effectively extract coherent features in specific frequency bands, consistent with the vocal continuity of sound events. This proves that FFDConv has great frequency-dependent perception ability. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 6 pages, 4 figures, submitted to ICME2024

arXiv:2307.09248 [pdf, other]

Application of BERT in Wind Power Forecasting-Teletraan's Solution in Baidu KDD Cup 2022

Authors: Longxing Tan, Hongying Yue

Abstract: Nowadays, wind energy has drawn increasing attention as its important role in carbon neutrality and sustainable development. When wind power is integrated into the power grid, precise forecasting is necessary for the sustainability and security of the system. However, the unpredictable nature and long sequence prediction make it especially challenging. In this technical report, we introduce the BE… ▽ More Nowadays, wind energy has drawn increasing attention as its important role in carbon neutrality and sustainable development. When wind power is integrated into the power grid, precise forecasting is necessary for the sustainability and security of the system. However, the unpredictable nature and long sequence prediction make it especially challenging. In this technical report, we introduce the BERT model applied for Baidu KDD Cup 2022, and the daily fluctuation is added by post-processing to make the predicted results in line with daily periodicity. Our solution achieves 3rd place of 2490 teams. The code is released athttps://github.com/LongxingTan/KDD2022-Baidu △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.10311 [pdf, other]

Efficient HDR Reconstruction from Real-World Raw Images

Authors: Qirui Yang, Yihao Liu, Qihua Chen, Huan**g Yue, Kun Li, **gyu Yang

Abstract: The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing H… ▽ More The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing HDR dataset collection methods often are labor-intensive. In this work, in a new aspect, we discover an excellent opportunity for HDR reconstructing directly from raw images and investigating novel neural network structures that benefit the deployment of mobile devices. Our key insights are threefold: (1) we develop a lightweight-efficient HDR model, RepUNet, using the structural re-parameterization technique to achieve fast and robust HDR; (2) we design a new computational raw HDR data formation pipeline and construct a real-world raw HDR dataset, RealRaw-HDR; (3) we propose a plug-and-play motion alignment loss to mitigate motion ghosting under limited bandwidth conditions. Our model contains less than 830K parameters and takes less than 3 ms to process an image of 4K resolution using one RTX 3090 GPU. While being highly efficient, our model also outperforms the state-of-the-art HDR methods in terms of PSNR, SSIM, and a color difference metric. △ Less

Submitted 5 June, 2024; v1 submitted 17 June, 2023; originally announced June 2023.

arXiv:2304.04773 [pdf, other]

HDR Video Reconstruction with a Large Dynamic Dataset in Raw and sRGB Domains

Authors: Huan**g Yue, Yubo Peng, Biting Yu, Xuanwu Yin, Zhenyu Zhou, **gyu Yang

Abstract: High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In… ▽ More High dynamic range (HDR) video reconstruction is attracting more and more attention due to the superior visual quality compared with those of low dynamic range (LDR) videos. The availability of LDR-HDR training pairs is essential for the HDR reconstruction quality. However, there are still no real LDR-HDR pairs for dynamic scenes due to the difficulty in capturing LDR-HDR frames simultaneously. In this work, we propose to utilize a staggered sensor to capture two alternate exposure images simultaneously, which are then fused into an HDR frame in both raw and sRGB domains. In this way, we build a large scale LDR-HDR video dataset with 85 scenes and each scene contains 60 frames. Based on this dataset, we further propose a Raw-HDRNet, which utilizes the raw LDR frames as inputs. We propose a pyramid flow-guided deformation convolution to align neighboring frames. Experimental results demonstrate that 1) the proposed dataset can improve the HDR reconstruction performance on real scenes for three benchmark networks; 2) Compared with sRGB inputs, utilizing raw inputs can further improve the reconstruction quality and our proposed Raw-HDRNet is a strong baseline for raw HDR reconstruction. Our dataset and code will be released after the acceptance of this paper. △ Less

Submitted 12 April, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.07327 [pdf, other]

Unsupervised HDR Image and Video Tone Map** via Contrastive Learning

Authors: Cong Cao, Huan**g Yue, Xin Liu, **gyu Yang

Abstract: Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone map** algorithm is required to compress the dynamic range of HDR images (videos). Although image tone map** has been widely explored, video tone map** is lagging behind, especially f… ▽ More Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone map** algorithm is required to compress the dynamic range of HDR images (videos). Although image tone map** has been widely explored, video tone map** is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone map**. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone map**, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone map**. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone map** methods. Our code and dataset are available at https://github.com/cao-cong/UnCLTMO. △ Less

Submitted 26 June, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:2209.12475 [pdf, other]

Real-RawVSR: Real-World Raw Video Super-Resolution with a Benchmark Dataset

Authors: Huan**g Yue, Zhiming Zhang, **gyu Yang

Abstract: In recent years, real image super-resolution (SR) has achieved promising results due to the development of SR datasets and corresponding real SR methods. In contrast, the field of real video SR is lagging behind, especially for real raw videos. Considering the superiority of raw image SR over sRGB image SR, we construct a real-world raw video SR (Real-RawVSR) dataset and propose a corresponding SR… ▽ More In recent years, real image super-resolution (SR) has achieved promising results due to the development of SR datasets and corresponding real SR methods. In contrast, the field of real video SR is lagging behind, especially for real raw videos. Considering the superiority of raw image SR over sRGB image SR, we construct a real-world raw video SR (Real-RawVSR) dataset and propose a corresponding SR method. We utilize two DSLR cameras and a beam-splitter to simultaneously capture low-resolution (LR) and high-resolution (HR) raw videos with 2x, 3x, and 4x magnifications. There are 450 video pairs in our dataset, with scenes varying from indoor to outdoor, and motions including camera and object movements. To our knowledge, this is the first real-world raw VSR dataset. Since the raw video is characterized by the Bayer pattern, we propose a two-branch network, which deals with both the packed RGGB sequence and the original Bayer pattern sequence, and the two branches are complementary to each other. After going through the proposed co-alignment, interaction, fusion, and reconstruction modules, we generate the corresponding HR sRGB sequence. Experimental results demonstrate that the proposed method outperforms benchmark real and synthetic video SR methods with either raw or sRGB inputs. Our code and dataset are available at https://github.com/zmzhang1998/Real-RawVSR. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: Accepted by ECCV2022

arXiv:2005.04026 [pdf]

Improved mathematical models of structured-light modulation analysis technique for contaminant and defect detection

Authors: Yiyang Huang, Huimin Yue, Yuyao Fang, Yi** Song, Yong Liu

Abstract: Surface quality inspection of optical components is critical in optical and electronic industries. Structured-Light Modulation Analysis Technique (SMAT) is a novel method recently proposed for the contaminant and defect detection of specular surfaces and transparent objects, and this approach was verified to be effective in eliminating ambient light. The mechanisms and mathematical models of SMAT… ▽ More Surface quality inspection of optical components is critical in optical and electronic industries. Structured-Light Modulation Analysis Technique (SMAT) is a novel method recently proposed for the contaminant and defect detection of specular surfaces and transparent objects, and this approach was verified to be effective in eliminating ambient light. The mechanisms and mathematical models of SMAT were analyzed and established based on the theory of photometry and the optical characteristics of contaminants and defects. However, there are still some phenomena exist as conundrums in actual detection process, which cannot be well explained. In order to better analyze the phenomena in practical circumstances, improved mathematical models of SMAT are constructed based on the surface topography of contaminants and defects in this paper. These mathematical models can be used as tools for analyzing various contaminants and defects in different systems, and provide effective instruction for subsequent work. Simulations and experiments on the modulation and the luminous flux of fringe patterns have been implemented to verify the validity of these mathematical models. In adddition, by using the fringe patterns with mutually perpendicular sinusoidal directions, two obtained modulation images can be merged to solve the incomplete information acquisition issue caused by the differentiated response of modulation. △ Less

Submitted 7 May, 2020; originally announced May 2020.

arXiv:2003.14013 [pdf, other]

Supervised Raw Video Denoising with a Benchmark Dataset on Dynamic Scenes

Authors: Huan**g Yue, Cong Cao, Lei Liao, Ronghe Chu, **gyu Yang

Abstract: In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for stati… ▽ More In recent years, the supervised learning strategy for real noisy image denoising has been emerging and has achieved promising results. In contrast, realistic noise removal for raw noisy videos is rarely studied due to the lack of noisy-clean pairs for dynamic scenes. Clean video frames for dynamic scenes cannot be captured with a long-exposure shutter or averaging multi-shots as was done for static images. In this paper, we solve this problem by creating motions for controllable objects, such as toys, and capturing each static moment for multiple times to generate clean video frames. In this way, we construct a dataset with 55 groups of noisy-clean videos with ISO values ranging from 1600 to 25600. To our knowledge, this is the first dynamic video dataset with noisy-clean pairs. Correspondingly, we propose a raw video denoising network (RViDeNet) by exploring the temporal, spatial, and channel correlations of video frames. Since the raw video has Bayer patterns, we pack it into four sub-sequences, i.e RGBG sequences, which are denoised by the proposed RViDeNet separately and finally fused into a clean video. In addition, our network not only outputs a raw denoising result, but also the sRGB result by going through an image signal processing (ISP) module, which enables users to generate the sRGB result with their favourite ISPs. Experimental results demonstrate that our method outperforms state-of-the-art video and raw image denoising algorithms on both indoor and outdoor videos. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: CVPR2020 accepted paper

arXiv:2003.11824 [pdf, other]

doi 10.1016/j.ymssp.2020.106985

Payload-agnostic Decoupling and Hybrid Vibration Isolation Control for a Maglev Platform with Redundant Actuation

Authors: Zhaopei Gong, Liang Ding, Shaozhen Li, Honghao Yue, Haibo Gao, Zongquan Deng

Abstract: Payload-specific vibration control may be suitable for a particular task but lacks generality and transferability required for adapting to the various payload. Self-decoupling and robust vibration control are the crucial problems to achieve payload-agnostic vibration control. However, there are problems still unsolved. In this article, we present a maglev vibration isolation platform (MVIP), whi… ▽ More Payload-specific vibration control may be suitable for a particular task but lacks generality and transferability required for adapting to the various payload. Self-decoupling and robust vibration control are the crucial problems to achieve payload-agnostic vibration control. However, there are problems still unsolved. In this article, we present a maglev vibration isolation platform (MVIP), which aims to attenuate vibration in the payload-agnostic task under a dynamic environment. Since efforts trying to suppress disturbance will encounter inevitable coupling problems, we analyzed the reasons resulting in it and proposed unique and effective solutions. To achieve payload-agnostic vibration control, we proposed a new control strategy, which is the main contribution of this article. It consists of a self-construct radial basis function neural network inversion (SRBFNNI) decoupling scheme and hybrid adaptive feed-forward internal model control (HAFIMC). The former one enables the MVIP to create a self inverse model with little prior knowledge and achieving self-decoupling. For the unique structure of MVIP, the vibration control problem is stated and addressed by the proposed HAFIMC, which utilizes the adaptive part to deal with the periodical disturbance and the internal mode part to deal with the stability. △ Less

Submitted 31 March, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: This is a preprint which has been submitted to Mechanical Systems and Signal Processing

arXiv:1911.12503 [pdf]

doi 10.1177/1077546319836892

System Integration and Control Design of a Maglev Platform for Space Vibration Isolation

Authors: Zhaopei Gong, Liang Ding, Honghao Yue, Haibo Gao, Rongqiang Liu, Zongquan Deng, Yifan Lu

Abstract: Micro-vibration has been a dominant factor impairing the performance of scientific experiments which are expected to be deployed in a micro-gravity environment such as Spacelab. The micro-vibration has a serious impact on scientific experiments requiring a quasi-static environment. Therefore, we proposed a maglev vibration isolation platform (MVIP) operating in six degrees of freedom (DOF) to fulf… ▽ More Micro-vibration has been a dominant factor impairing the performance of scientific experiments which are expected to be deployed in a micro-gravity environment such as Spacelab. The micro-vibration has a serious impact on scientific experiments requiring a quasi-static environment. Therefore, we proposed a maglev vibration isolation platform (MVIP) operating in six degrees of freedom (DOF) to fulfill the environmental requirements. In view of non-contact and large stroke requirements for micro-vibration isolation, an optimization method was utilized to design the actuator. Mathematical models of the actuator's remarkable nonlinearity were established so that its output can be compensated according to floater's varying position and the system's performance may be satisfied. Furthermore, aiming to adapt to an energy-limited environment such as Spacelab, an optimum allocation scheme was put forward. Considering the actuator's nonlinearity, accuracy and minimum energy-consumption can be obtained simultaneously. In view of operating in six DOF, methods for nonlinear compensation and system decoupling were discussed, the necessary controller was also presented. Simulation and experiments validate the system's performance. With a movement range of 10x10x8 mm and rotations of 200 mrad, the decay ratio of -40 dB/Dec between 1-10 Hz was obtained under closed-loop control. △ Less

Submitted 25 March, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: A preprint version

Journal ref: Journal of Vibration and Control 25.11 (2019): 1720-1736

Showing 1–12 of 12 results for author: Yue, H