Search | arXiv e-print repository

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Authors: Yongqi Wang, Wenxiang Guo, Rongjie Huang, Jiawei Huang, Zehan Wang, Fuming You, Ruiqi Li, Zhou Zhao

Abstract: Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and c… ▽ More Video-to-audio (V2A) generation aims to synthesize content-matching audio from silent video, and it remains challenging to build V2A models with high generation quality, efficiency, and visual-audio temporal synchrony. We propose Frieren, a V2A model based on rectified flow matching. Frieren regresses the conditional transport vector field from noise to spectrogram latent with straight paths and conducts sampling by solving ODE, outperforming autoregressive and score-based models in terms of audio quality. By employing a non-autoregressive vector field estimator based on a feed-forward transformer and channel-level cross-modal feature fusion with strong temporal alignment, our model generates audio that is highly synchronized with the input video. Furthermore, through reflow and one-step distillation with guided vector field, our model can generate decent audio in a few, or even only one sampling step. Experiments indicate that Frieren achieves state-of-the-art performance in both generation quality and temporal alignment on VGGSound, with alignment accuracy reaching 97.22%, and 6.2% improvement in inception score over the strong diffusion-based baseline. Audio samples are available at http://frieren-v2a.github.io . △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2404.09313 [pdf, other]

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consists of singing voice synthesis (SVS) and vocal-to-accompaniment (V2A) synthesis. Melodist leverages tri-tower contrastive pretraining to learn more effective text representation for controllable V2A synthesis. A Chinese song dataset mined from a music website is built up to alleviate data scarcity for our research. The evaluation results on our dataset demonstrate that Melodist can synthesize songs with comparable quality and style consistency. Audio samples can be found in https://text2songMelodist.github.io/Sample/. △ Less

Submitted 20 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: ACL 2024 Main

arXiv:2403.11780 [pdf, other]

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Authors: Yongqi Wang, Ruofan Hu, Rongjie Huang, Zhiqing Hong, Ruiqi Li, Wenrui Liu, Fuming You, Tao **, Zhou Zhao

Abstract: Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only… ▽ More Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and naturalness, yet they lack the capability to control the style attributes of the synthesized singing explicitly. We propose Prompt-Singer, the first SVS method that enables attribute controlling on singer gender, vocal range and volume with natural language. We adopt a model architecture based on a decoder-only transformer with a multi-scale hierarchy, and design a range-melody decoupled pitch representation that enables text-conditioned vocal range control while kee** melodic accuracy. Furthermore, we explore various experiment settings, including different types of text representations, text encoder fine-tuning, and introducing speech data to alleviate data scarcity, aiming to facilitate further research. Experiments show that our model achieves favorable controlling ability and audio quality. Audio samples are available at http://prompt-singer.github.io . △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by NAACL 2024 (main conference)

arXiv:2402.10977 [pdf]

doi 10.1016/j.compchemeng.2024.108723

Generative AI and Process Systems Engineering: The Next Frontier

Authors: Benjamin Decardi-Nelson, Abdulelah S. Alshehri, Akshay Ajagekar, Fengqi You

Abstract: This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE). These cutting-edge GenAI models, particularly foundation models (FMs), which are pre-trained on extensive, general-purpose datasets, offer versatile adaptability for a broad range of tasks, including… ▽ More This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE). These cutting-edge GenAI models, particularly foundation models (FMs), which are pre-trained on extensive, general-purpose datasets, offer versatile adaptability for a broad range of tasks, including responding to queries, image generation, and complex decision-making. Given the close relationship between advancements in PSE and developments in computing and systems technologies, exploring the synergy between GenAI and PSE is essential. We begin our discussion with a compact overview of both classic and emerging GenAI models, including FMs, and then dive into their applications within key PSE domains: synthesis and design, optimization and integration, and process monitoring and control. In each domain, we explore how GenAI models could potentially advance PSE methodologies, providing insights and prospects for each area. Furthermore, the article identifies and discusses potential challenges in fully leveraging GenAI within PSE, including multiscale modeling, data requirements, evaluation metrics and benchmarks, and trust and safety, thereby deepening the discourse on effective GenAI integration into systems analysis, design, optimization, operations, monitoring, and control. This paper provides a guide for future research focused on the applications of emerging GenAI in PSE. △ Less

Submitted 6 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Journal ref: Computers & Chemical Engineering, Volume 187, August 2024, 108723

arXiv:2003.00264 [pdf]

doi 10.1016/j.compchemeng.2020.107119

Quantum Computing Assisted Deep Learning for Fault Detection and Diagnosis in Industrial Process Systems

Authors: Akshay Ajagekar, Fengqi You

Abstract: Quantum computing (QC) and deep learning techniques have attracted widespread attention in the recent years. This paper proposes QC-based deep learning methods for fault diagnosis that exploit their unique capabilities to overcome the computational challenges faced by conventional data-driven approaches performed on classical computers. Deep belief networks are integrated into the proposed fault d… ▽ More Quantum computing (QC) and deep learning techniques have attracted widespread attention in the recent years. This paper proposes QC-based deep learning methods for fault diagnosis that exploit their unique capabilities to overcome the computational challenges faced by conventional data-driven approaches performed on classical computers. Deep belief networks are integrated into the proposed fault diagnosis model and are used to extract features at different levels for normal and faulty process operations. The QC-based fault diagnosis model uses a quantum computing assisted generative training process followed by discriminative training to address the shortcomings of classical algorithms. To demonstrate its applicability and efficiency, the proposed fault diagnosis method is applied to process monitoring of continuous stirred tank reactor (CSTR) and Tennessee Eastman (TE) process. The proposed QC-based deep learning approach enjoys superior fault detection and diagnosis performance with obtained average fault detection rates of 79.2% and 99.39% for CSTR and TE process, respectively. △ Less

Submitted 1 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

Journal ref: Comp. Chem. Eng., 143 (2020), pp. 107119

arXiv:1912.12666 [pdf]

Efficient Greenhouse Temperature Control with Data-Driven Robust Model Predictive Control

Authors: Wei-Han Chen, Fengqi You

Abstract: Appropriate greenhouse temperature should be maintained to ensure crop production while minimizing energy consumption. Even though weather forecasts could provide a certain amount of information to improve control performance, it is not perfect and forecast error may cause the temperature to deviate from the acceptable range. To inherent uncertainty in weather that affects control accuracy, this p… ▽ More Appropriate greenhouse temperature should be maintained to ensure crop production while minimizing energy consumption. Even though weather forecasts could provide a certain amount of information to improve control performance, it is not perfect and forecast error may cause the temperature to deviate from the acceptable range. To inherent uncertainty in weather that affects control accuracy, this paper develops a data-driven robust model predictive control (MPC) approach for greenhouse temperature control. The dynamic model is obtained from thermal resistance-capacitance modeling derived by the Building Resistance-Capacitance Modeling (BRCM) toolbox. Uncertainty sets of ambient temperature and solar radiation are captured by support vector clustering technique, and they are further tuned for better quality by training-calibration procedure. A case study that implements the carefully chosen uncertainty sets on robust model predictive control shows that the data-driven robust MPC has better control performance compared to rule-based control, certainty equivalent MPC, and robust MPC. △ Less

Submitted 31 December, 2019; v1 submitted 29 December, 2019; originally announced December 2019.

arXiv:1903.11734 [pdf, other]

doi 10.1109/TAC.2020.3024273

A Posteriori Probabilistic Bounds of Convex Scenario Programs with Validation Tests

Authors: Chao Shang, Fengqi You

Abstract: Scenario programs have established themselves as efficient tools towards decision-making under uncertainty. To assess the quality of scenario-based solutions a posteriori, validation tests based on Bernoulli trials have been widely adopted in practice. However, to reach a theoretically reliable judgement of risk, one typically needs to collect massive validation samples. In this work, we propose n… ▽ More Scenario programs have established themselves as efficient tools towards decision-making under uncertainty. To assess the quality of scenario-based solutions a posteriori, validation tests based on Bernoulli trials have been widely adopted in practice. However, to reach a theoretically reliable judgement of risk, one typically needs to collect massive validation samples. In this work, we propose new a posteriori bounds for convex scenario programs with validation tests, which are dependent on both realizations of support constraints and performance on out-of-sample validation data. The proposed bounds enjoy wide generality in that many existing theoretical results can be incorporated as particular cases. To facilitate practical use, a systematic approach for parameterizing a posteriori probability bounds is also developed, which is shown to possess a variety of desirable properties allowing for easy implementations and clear interpretations. By synthesizing comprehensive information about support constraints and validation tests, improved risk evaluation can be achieved for randomized solutions in comparison with existing a posteriori bounds. Case studies on controller design of aircraft lateral motion are presented to validate the effectiveness of the proposed a posteriori bounds. △ Less

Submitted 13 September, 2020; v1 submitted 27 March, 2019; originally announced March 2019.

Journal ref: IEEE Transactions on Automatic Control, Sept. 2021, Volume 66, Issue 9, Pages 4015 - 4028

arXiv:1810.05947 [pdf, other]

doi 10.1109/TCST.2019.2916753

Robust Model Predictive Control of Irrigation Systems with Active Uncertainty Learning and Data Analytics

Authors: Chao Shang, Wei-Han Chen, Abraham Duncan Stroock, Fengqi You

Abstract: We develop a novel data-driven robust model predictive control (DDRMPC) approach for automatic control of irrigation systems. The fundamental idea is to integrate both mechanistic models, which describe dynamics in soil moisture variations, and data-driven models, which characterize uncertainty in forecast errors of evapotranspiration and precipitation, into a holistic systems control framework. T… ▽ More We develop a novel data-driven robust model predictive control (DDRMPC) approach for automatic control of irrigation systems. The fundamental idea is to integrate both mechanistic models, which describe dynamics in soil moisture variations, and data-driven models, which characterize uncertainty in forecast errors of evapotranspiration and precipitation, into a holistic systems control framework. To better capture the support of uncertainty distribution, we take a new learning-based approach by constructing uncertainty sets from historical data. For evapotranspiration forecast error, the support vector clustering-based uncertainty set is adopted, which can be conveniently built from historical data. As for precipitation forecast errors, we analyze the dependence of their distribution on forecast values, and further design a tailored uncertainty set based on the properties of this type of uncertainty. In this way, the overall uncertainty distribution can be elaborately described, which finally contributes to rational and efficient control decisions. To assure the quality of data-driven uncertainty sets, a training-calibration scheme is used to provide theoretical performance guarantees. A generalized affine decision rule is adopted to obtain tractable approximations of optimal control problems, thereby ensuring the practicability of DDRMPC. Case studies using real data show that, DDRMPC can reliably maintain soil moisture above the safety level and avoid crop devastation. The proposed DDRMPC approach leads to a 40% reduction of total water consumption compared to the fine-tuned open-loop control strategy. In comparison with the carefully tuned rule-based control and certainty equivalent model predictive control, the proposed DDRMPC approach can significantly reduce the total water consumption and improve the control performance. △ Less

Submitted 23 May, 2019; v1 submitted 13 October, 2018; originally announced October 2018.

Journal ref: IEEE Transactions on Control Systems Technology, vol. 28, no. 4, pp. 1493-1504, 2020

arXiv:1810.05931 [pdf]

doi 10.1016/j.automatica.2019.108802

A Transformation-Proximal Bundle Algorithm for Multistage Adaptive Robust Optimization and Application to Constrained Robust Optimal Control

Authors: Chao Ning, Fengqi You

Abstract: This paper presents a novel transformation-proximal bundle algorithm for multistage adaptive robust optimization problems. By partitioning recourse decisions into state and control decisions, the proposed algorithm applies affine control policy only to state decisions and allows control decisions to be fully adaptive, thus transforming the original problem into an equivalent two-stage Adaptive Rob… ▽ More This paper presents a novel transformation-proximal bundle algorithm for multistage adaptive robust optimization problems. By partitioning recourse decisions into state and control decisions, the proposed algorithm applies affine control policy only to state decisions and allows control decisions to be fully adaptive, thus transforming the original problem into an equivalent two-stage Adaptive Robust Optimization (ARO) problem. Importantly, this multi-to-two transformation is general enough to be employed with any two-stage ARO solution algorithms, thus opening a new avenue for a variety of multistage ARO algorithms. The proximal bundle method is developed for the resulting two-stage problem along with convergence analysis. In an inventory control application, the affine disturbance-feedback control policy suffers from a severe suboptimality with an average gap of 34.88%, while the proposed algorithm generates an average gap of merely 1.68%. △ Less

Submitted 29 December, 2019; v1 submitted 13 October, 2018; originally announced October 2018.

Journal ref: Automatica, Volume 113, March 2020, 108802

arXiv:1807.05146 [pdf, other]

doi 10.1016/j.jprocont.2018.12.013

A data-driven robust optimization approach to scenario-based stochastic model predictive control

Authors: Chao Shang, Fengqi You

Abstract: Stochastic model predictive control (SMPC) has been a promising solution to complex control problems under uncertain disturbances. However, traditional SMPC approaches either require exact knowledge of probabilistic distributions, or rely on massive scenarios that are generated to represent uncertainties. In this paper, a novel scenario-based SMPC approach is proposed by actively learning a data-d… ▽ More Stochastic model predictive control (SMPC) has been a promising solution to complex control problems under uncertain disturbances. However, traditional SMPC approaches either require exact knowledge of probabilistic distributions, or rely on massive scenarios that are generated to represent uncertainties. In this paper, a novel scenario-based SMPC approach is proposed by actively learning a data-driven uncertainty set from available data with machine learning techniques. A systematical procedure is then proposed to further calibrate the uncertainty set, which gives appropriate probabilistic guarantee. The resulting data-driven uncertainty set is more compact than traditional norm-based sets, and can help reducing conservatism of control actions. Meanwhile, the proposed method requires less data samples than traditional scenario-based SMPC approaches, thereby enhancing the practicability of SMPC. Finally the optimal control problem is cast as a single-stage robust optimization problem, which can be solved efficiently by deriving the robust counterpart problem. The feasibility and stability issue is also discussed in detail. The efficacy of the proposed approach is demonstrated through a two-mass-spring system and a building energy control problem under uncertain disturbances. △ Less

Submitted 14 January, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

Journal ref: Journal of Process Control, Volume 75, March 2019, Pages 24-39

arXiv:1707.09198 [pdf]

doi 10.1016/j.compchemeng.2017.12.015

Data-Driven Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era

Authors: Chao Ning, Fengqi You

Abstract: A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for un… ▽ More A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm. △ Less

Submitted 29 December, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

Journal ref: Computers & Chemical Engineering, Volume 111, Pages 115-133, 4 March 2018,

Showing 1–11 of 11 results for author: You, F