Skip to main content

Showing 1–38 of 38 results for author: Song, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2405.15831  [pdf, other

    eess.SY cs.AI cs.LG

    Transmission Interface Power Flow Adjustment: A Deep Reinforcement Learning Approach based on Multi-task Attribution Map

    Authors: Shunyu Liu, Wei Luo, Yanzhen Zhou, Kaixuan Chen, Quan Zhang, Huating Xu, Qinglai Guo, Mingli Song

    Abstract: Transmission interface power flow adjustment is a critical measure to ensure the security and economy operation of power systems. However, conventional model-based adjustment schemes are limited by the increasing variations and uncertainties occur in power systems, where the adjustment problems of different transmission interfaces are often treated as several independent tasks, ignoring their coup… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Power Systems

  2. arXiv:2404.01620  [pdf

    cs.SD cs.AI cs.CY eess.AS

    Voice EHR: Introducing Multimodal Audio Data for Health

    Authors: James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Hang Nguyen, Phuc Hong, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery, Cheran Elangovan, Balaji Krishnaiah, Lee Akst, Ioan Lina, Iqbal Elyazar, Lenny Ekwati, Stefan Jansen, Richard Nduwayezu, Charisse Garcia, Jeffrey Plum, Jacqueline Brenner, Miranda Song, Emily Ricotta, David Clifton , et al. (3 additional authors not shown)

    Abstract: Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio d… ▽ More

    Submitted 1 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 19 pages, 2 figures, 7 tables

  3. arXiv:2401.12987  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation

    Authors: Taeyang Yun, Hyunkuk Lim, Jeonghwan Lee, Min Song

    Abstract: Emotion Recognition in Conversation (ERC) plays a crucial role in enabling dialogue systems to effectively respond to user requests. The emotions in a conversation can be identified by the representations from various modalities, such as audio, visual, and text. However, due to the weak contribution of non-verbal modalities to recognize emotions, multimodal ERC has always been considered a challen… ▽ More

    Submitted 31 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: NAACL 2024 main conference

  4. arXiv:2401.11902  [pdf, other

    eess.IV cs.CV

    A Training-Free Defense Framework for Robust Learned Image Compression

    Authors: Myungseo Song, **young Choi, Bohyung Han

    Abstract: We study the robustness of learned image compression models against adversarial attacks and present a training-free defense technique based on simple image transform functions. Recent learned image compression models are vulnerable to adversarial attacks that result in poor compression rate, low reconstruction quality, or weird artifacts. To address the limitations, we propose a simple but effecti… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 10 pages and 14 figures

  5. arXiv:2401.02771  [pdf, other

    cs.LG eess.SY

    Powerformer: A Section-adaptive Transformer for Power Flow Adjustment

    Authors: Kaixuan Chen, Wei Luo, Shunyu Liu, Yaoquan Wei, Yihe Zhou, Yunpeng Qing, Quan Zhang, Jie Song, Mingli Song

    Abstract: In this paper, we present a novel transformer architecture tailored for learning robust power system state representations, which strives to optimize power dispatch for the power flow adjustment across different transmission sections. Specifically, our proposed approach, named Powerformer, develops a dedicated section-adaptive attention mechanism, separating itself from the self-attention used in… ▽ More

    Submitted 30 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: 8 figures

  6. arXiv:2312.03490  [pdf, other

    eess.IV cs.CV

    PneumoLLM: Harnessing the Power of Large Language Model for Pneumoconiosis Diagnosis

    Authors: Meiyue Song, Zhihua Yu, Jiaxin Wang, Jiarui Wang, Yuting Lu, Baicun Li, Xiaoxu Wang, Qinghua Huang, Zhijun Li, Nikolaos I. Kanellakis, Jiangfeng Liu, **g Wang, Binglu Wang, Juntao Yang

    Abstract: The conventional pretraining-and-finetuning paradigm, while effective for common diseases with ample data, faces challenges in diagnosing data-scarce occupational diseases like pneumoconiosis. Recently, large language models (LLMs) have exhibits unprecedented ability when conducting multiple tasks in dialogue, bringing opportunities to diagnosis. A common strategy might involve using adapter layer… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Medical Image Analysis

  7. arXiv:2311.14295  [pdf, ps, other

    cs.IT eess.SP

    Exploiting Active RIS in NOMA Networks with Hardware Impairments

    Authors: Xinwei Yue, Meiqi Song, Chongjun Ouyang, Yuanwei Liu, Tian Li, Tianwei Hou

    Abstract: Active reconfigurable intelligent surface (ARIS) is a promising way to compensate for multiplicative fading attenuation by amplifying and reflecting event signals to selected users. This paper investigates the performance of ARIS assisted non-orthogonal multiple access (NOMA) networks over cascaded Nakagami-m fading channels. The effects of hardware impairments (HIS) and reflection coefficients on… ▽ More

    Submitted 12 January, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

  8. arXiv:2311.10463  [pdf, other

    eess.IV cs.CV

    Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI

    Authors: Xiatian Zhang, Sisi Zheng, Hubert P. H. Shum, Haozheng Zhang, Nan Song, Mingkang Song, Hongxiao Jia

    Abstract: Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mech… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: Proceedings of the 2023 International Conference on Neural Information Processing (ICONIP)

  9. arXiv:2308.06285  [pdf, other

    cs.HC eess.IV

    An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques

    Authors: Chaoqing Xu, Zhentao Zheng, Yiting Fu, Baofeng Chang, Legao Chen, Minghui Wu, Mingli Song, **song Jiang

    Abstract: Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery p… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  10. arXiv:2306.02913  [pdf, other

    cs.LG cs.CY cs.DC eess.SY stat.ML

    Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

    Authors: Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

    Abstract: Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-S… ▽ More

    Submitted 9 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: 40th International Conference on Machine Learning (ICML 2023)

  11. arXiv:2303.15669  [pdf, other

    eess.AS cs.AI cs.LG

    Unsupervised Pre-Training For Data-Efficient Text-to-Speech On Low Resource Languages

    Authors: Seongyeon Park, Myungseo Song, Bohyung Kim, Tae-Hyun Oh

    Abstract: Neural text-to-speech (TTS) models can synthesize natural human speech when trained on large amounts of transcribed speech. However, collecting such large-scale transcribed data is expensive. This paper proposes an unsupervised pre-training method for a sequence-to-sequence TTS model by leveraging large untranscribed speech data. With our pre-training, we can remarkably reduce the amount of paired… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: ICASSP 2023

  12. arXiv:2303.09199  [pdf, other

    cs.CV eess.IV

    A Generative Model for Digital Camera Noise Synthesis

    Authors: Mingyang Song, Yang Zhang, Tunç O. Aydın, Elham Amin Mansour, Christopher Schroers

    Abstract: Noise synthesis is a challenging low-level vision task aiming to generate realistic noise given a clean image along with the camera settings. To this end, we propose an effective generative model which utilizes clean features as guidance followed by noise injections into the network. Specifically, our generator follows a UNet-like structure with skip connections but without downsampling and upsamp… ▽ More

    Submitted 13 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

  13. arXiv:2212.11486  [pdf, other

    cs.CR eess.SP

    Over-the-Air Federated Learning with Enhanced Privacy

    Authors: Xiaochan Xue, Moh Khalid Hasan, Shucheng Yu, Laxima Niure Kandel, Min Song

    Abstract: Federated learning (FL) has emerged as a promising learning paradigm in which only local model parameters (gradients) are shared. Private user data never leaves the local devices thus preserving data privacy. However, recent research has shown that even when local data is never shared by a user, exchanging model parameters without protection can also leak private information. Moreover, in wireless… ▽ More

    Submitted 22 December, 2022; originally announced December 2022.

    Comments: 6 pages

  14. arXiv:2209.07384  [pdf, other

    cs.SD cs.AI eess.AS

    Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts

    Authors: Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller

    Abstract: Vocal bursts play an important role in communicating affect, making them valuable for improving speech emotion recognition. Here, we present our approach for classifying vocal bursts and predicting their emotional significance in the ACII Affective Vocal Burst Workshop & Challenge 2022 (A-VB). We use a large self-supervised audio model as shared feature extractor and compare multiple architectures… ▽ More

    Submitted 27 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

    Comments: 4 pages, 1 figure, accepted at The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB)

  15. arXiv:2208.10922  [pdf, other

    cs.CV cs.LG eess.AS eess.IV

    StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

    Authors: Dongchan Min, Minyoung Song, Eunji Ko, Sung Ju Hwang

    Abstract: We propose StyleTalker, a novel audio-driven talking head generation model that can synthesize a video of a talking person from a single reference image with accurately audio-synced lip shapes, realistic head poses, and eye blinks. Specifically, by leveraging a pretrained image generator and an image encoder, we estimate the latent codes of the talking head video that faithfully reflects the given… ▽ More

    Submitted 15 March, 2024; v1 submitted 23 August, 2022; originally announced August 2022.

  16. arXiv:2206.13390  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!

    Authors: Chenglizhao Chen, Mengke Song, Wenfeng Song, Li Guo, Muwei Jian

    Abstract: Video saliency detection (VSD) aims at fast locating the most attractive objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied on the visual system but paid less attention to the audio aspect, while, actually, our audio system is the most vital complementary part to our visual system. Also, audio-visual saliency detection (AVSD), one of the most representativ… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  17. arXiv:2206.11049  [pdf, other

    cs.SD cs.LG eess.AS

    Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression

    Authors: Meishu Song, Zijiang Yang, Andreas Triantafyllopoulos, Xin **g, Vincent Karas, Xie Jiangjian, Zixing Zhang, Yamamoto Yoshiharu, Bjoern W. Schuller

    Abstract: We propose a novel Dynamic Restrained Uncertainty Weighting Loss to experimentally handle the problem of balancing the contributions of multiple tasks on the ICML ExVo 2022 Challenge. The multitask aims to recognize expressed emotions and demographic traits from vocal bursts jointly. Our strategy combines the advantages of Uncertainty Weight and Dynamic Weight Average, by extending weights with a… ▽ More

    Submitted 27 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: 5 pages

  18. arXiv:2206.11045  [pdf, other

    eess.AS cs.LG cs.SD

    COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

    Authors: Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller

    Abstract: More than two years after its outbreak, the COVID-19 pandemic continues to plague medical systems around the world, putting a strain on scarce resources, and claiming human lives. From the very beginning, various AI-based COVID-19 detection and monitoring tools have been pursued in an attempt to stem the tide of infections through timely diagnosis. In particular, computer audition has been suggest… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  19. arXiv:2206.09142  [pdf, other

    cs.SD eess.AS

    Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression

    Authors: Xin **g, Meishu Song, Andreas Triantafyllopoulos, Zijiang Yang, Björn W. Schuller

    Abstract: In this paper, we propose the Redundancy Reduction Twins Network (RRTN), a redundancy reduction training framework that minimizes redundancy by measuring the cross-correlation matrix between the outputs of the same network fed with distorted versions of a sample and bringing it as close to the identity matrix as possible. RRTN also applies a new loss function, the Barlow Twins loss function, to he… ▽ More

    Submitted 28 June, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

    Comments: 5 pages, accepted by ICML Exvo workshop

  20. arXiv:2206.06680  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

    Authors: Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin **g, Björn W. Schuller

    Abstract: In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enr… ▽ More

    Submitted 20 June, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)

  21. arXiv:2206.02705  [pdf

    eess.SP cs.AI

    Human Behavior Recognition Method Based on CEEMD-ES Radar Selection

    Authors: Zhaolin Zhang, Mingqi Song, Wugang Meng, Yuhan Liu, Fengcong Li, Xiang Feng, Yinan Zhao

    Abstract: In recent years, the millimeter-wave radar to identify human behavior has been widely used in medical,security, and other fields. When multiple radars are performing detection tasks, the validity of the features contained in each radar is difficult to guarantee. In addition, processing multiple radar data also requires a lot of time and computational cost. The Complementary Ensemble Empirical Mode… ▽ More

    Submitted 6 June, 2022; originally announced June 2022.

    Comments: 4 pages, 5 figures

  22. arXiv:2205.06576  [pdf, other

    eess.SY cs.AI

    Distribution-Aware Graph Representation Learning for Transient Stability Assessment of Power System

    Authors: Kaixuan Chen, Shunyu Liu, Na Yu, Rong Yan, Quan Zhang, Jie Song, Zunlei Feng, Mingli Song

    Abstract: The real-time transient stability assessment (TSA) plays a critical role in the secure operation of the power system. Although the classic numerical integration method, \textit{i.e.} time-domain simulation (TDS), has been widely used in industry practice, it is inevitably trapped in a high computational complexity due to the high latitude sophistication of the power system. In this work, a data-dr… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: 8 pages, 6 figures, 4 tables

  23. arXiv:2203.17012  [pdf, other

    cs.SD cs.LG eess.AS

    A Temporal-oriented Broadcast ResNet for COVID-19 Detection

    Authors: Xin **g, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Björn W. Schuller

    Abstract: Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual l… ▽ More

    Submitted 31 March, 2022; originally announced March 2022.

    Comments: 5 pages,submitted to Intesspeech 2022

  24. arXiv:2112.12386  [pdf, other

    eess.IV cs.CV

    KFWC: A Knowledge-Driven Deep Learning Model for Fine-grained Classification of Wet-AMD

    Authors: Haihong E, Jiawen He, Tianyi Hu, Lifei Wang, Lifei Yuan, Ruru Zhang, Meina Song

    Abstract: Automated diagnosis using deep neural networks can help ophthalmologists detect the blinding eye disease wet Age-related Macular Degeneration (AMD). Wet-AMD has two similar subtypes, Neovascular AMD and Polypoidal Choroidal Vessels (PCV). However, due to the difficulty in data collection and the similarity between images, most studies have only achieved the coarse-grained classification of wet-AMD… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

  25. Machine Learning-Based 3D Channel Modeling for U2V mmWave Communications

    Authors: Kai Mao, Qiuming Zhu, Maozhong Song, Hanpeng Li, Benzhe Ning, Boyu Hua, Wei Fan

    Abstract: Unmanned aerial vehicle (UAV) millimeter wave (mmWave) technologies can provide flexible link and high data rate for future communication networks. By considering the new features of three-dimensional (3D) scattering space, 3D velocity, 3D antenna array, and especially 3D rotations, a machine learning (ML) integrated UAV-to-Vehicle (U2V) mmWave channel model is proposed. Meanwhile, a ML-based netw… ▽ More

    Submitted 5 September, 2021; originally announced September 2021.

    Comments: IEEE Internet of Things Journal, early access, Mar. 2022

    Journal ref: in IEEE Internet of Things Journal, vol. 9, no. 18, pp. 17592-17607, 15 Sept.15, 2022

  26. arXiv:2108.09551  [pdf, other

    eess.IV cs.CV cs.LG

    Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform

    Authors: Myungseo Song, **young Choi, Bohyung Han

    Abstract: We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815), which takes a source image and a corresponding quality map as inputs and produce a compressed image with variable rates. Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps. In addition, the proposed framework… ▽ More

    Submitted 21 August, 2021; originally announced August 2021.

    Comments: ICCV 2021

  27. Meteorologically Introduced Impacts on Aerial Channels and UAV Communications

    Authors: Mengan Song, Yiming Huo, Tao Lu, Xiaodai Dong, Zhonghua Liang

    Abstract: As 5G wireless systems and networks are now being globally commercialized and deployed, more diversified application scenarios are emerging, quickly resha** our societies and paving the road to the beyond 5G (6G) era when terahertz (THz) and unmanned aerial vehicle (UAV) communications may play critical roles. In this paper, aerial channel models under multiple meteorological conditions such as… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 5 pages, 7 figures, accepted by IEEE VTC2020-FALL

  28. Map-based Channel Modeling and Generation for U2V mmWave Communication

    Authors: Qiuming Zhu, Kai Mao, Maozhong Song, Xiaomin Chen, Boyu Hua, Weizhi Zhong, Xijuan Ye

    Abstract: Unmanned aerial vehicle (UAV) aided millimeter wave (mmWave) technologies have a promising prospect in the future communication networks. By considering the factors of three-dimensional (3D) scattering space, 3D trajectory, and 3D antenna array, a non-stationary channel model for UAV-to-vehicle (U2V) mmWave communications is proposed. The computation and generation methods of channel parameters in… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

    Journal ref: in IEEE Transactions on Vehicular Technology, vol. 71, no. 8, pp. 8004-8015, Aug. 2022

  29. arXiv:2103.00430  [pdf, other

    cs.CV cs.LG eess.IV

    Training Generative Adversarial Networks in One Stage

    Authors: Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin Li, Jie Song, Mingli Song

    Abstract: Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Ba… ▽ More

    Submitted 16 June, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: Accepted to CVPR 2021

  30. arXiv:2012.02859  [pdf, other

    eess.SY

    Idle speed control with low-complexity offset-free explicit model predictive control in presence of system delay

    Authors: Sang Hwan Son, Se-Kyu Oh, Byung Jun Park, Min Jun Song, Jong Min Lee

    Abstract: The requirement for continual improvement of idle speed control (ISC) performance is increasing due to the stringent regulation on emission and fuel economy these days. In this regard, a low-complexity offset-free explicit model predictive control (EMPC) with constraint horizon is designed to regulate the idle speed under unmeasured disturbance in presence of system delay with rigorous formulation… ▽ More

    Submitted 13 December, 2020; v1 submitted 4 December, 2020; originally announced December 2020.

  31. arXiv:2005.00096  [pdf, other

    eess.AS cs.CL cs.SD

    An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

    Authors: **g Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller

    Abstract: The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought to the fore in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on develo** some potential use-case… ▽ More

    Submitted 14 May, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

  32. arXiv:1911.11502  [pdf, other

    cs.CV cs.LG eess.AS

    Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

    Authors: Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song

    Abstract: Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features fr… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: AAAI 2020

  33. arXiv:1907.00390  [pdf, other

    cs.CL cs.AI eess.AS

    A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling

    Authors: Haihong E, Peiqing Niu, Zhongfu Chen, Meina Song

    Abstract: A spoken language understanding (SLU) system includes two main tasks, slot filling (SF) and intent detection (ID). The joint model for the two tasks is becoming a tendency in SLU. But the bi-directional interrelated connections between the intent and slots are not established in the existing joint models. In this paper, we propose a novel bi-directional interrelated model for joint intent detectio… ▽ More

    Submitted 30 June, 2019; originally announced July 2019.

    Comments: Accepted paper of ACL 2019 (short paper) with 5 pages

  34. Convexity Analysis of Optimization Framework of Attitude Determination from Vector Observations

    Authors: ** Wu, Zebo Zhou, Min Song

    Abstract: In the past several years, there have been several representative attitude determination methods developed using derivative-based optimization algorithms. Optimization techniques e.g. gradient-descent algorithm (GDA), Gauss-Newton algorithm (GNA), Levenberg-Marquadt algorithm (LMA) suffer from local optimum in real engineering practices. A brief discussion on the convexity of this problem is prese… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

    Journal ref: IEEE CODIT 2019

  35. arXiv:1803.07713  [pdf, ps, other

    eess.SP

    Robust Beamforming for SWIPT System with Chance Constraints

    Authors: Yinglei Teng, Wanxin Zhao, Mei Yan, Yong Zhang, Mei Song

    Abstract: The robust beamforming problem in multiple-input single-output (MISO) downlink networks of simultaneous wireless information and power transfer (SWIPT) is studied in this paper. Adopting the time switching fashion to perform energy harvesting and information decoding respectively, we aim at maximizing the sum rate under imperfect channel state information (CSI) and the chance constraints of users'… ▽ More

    Submitted 20 March, 2018; originally announced March 2018.

    Comments: 6 pages, 5 figures, to appear in IEEE ICC 2018, May 20-24

  36. DTER: Schedule Optimal RF Energy Request and Harvest for Internet of Things

    Authors: Yu Luo, Lina Pu, Yanxiao Zhao, Guodong Wang, Min Song

    Abstract: We propose a new energy harvesting strategy that uses a dedicated energy source (ES) to optimally replenish energy for radio frequency (RF) energy harvesting powered Internet of Things. Specifically, we develop a two-step dual tunnel energy requesting (DTER) strategy that minimizes the energy consumption on both the energy harvesting device and the ES. Besides the causality and capacity constraint… ▽ More

    Submitted 20 February, 2018; originally announced February 2018.

  37. arXiv:1802.07101  [pdf, other

    cs.CV eess.IV

    Stroke Controllable Fast Style Transfer with Adaptive Receptive Fields

    Authors: Yongcheng **g, Yang Liu, Yezhou Yang, Zunlei Feng, Yizhou Yu, Dacheng Tao, Mingli Song

    Abstract: The Fast Style Transfer methods have been recently proposed to transfer a photograph to an artistic style in real-time. This task involves controlling the stroke size in the stylized results, which remains an open challenge. In this paper, we present a stroke controllable style transfer network that can achieve continuous and spatial stroke size control. By analyzing the factors that influence the… ▽ More

    Submitted 18 October, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: Accepted by ECCV2018. Supplementary material: https://yongcheng**g.com/pdf/strokeControllable_supp.pdf

  38. arXiv:1705.04058  [pdf, other

    cs.CV cs.NE eess.IV stat.ML

    Neural Style Transfer: A Review

    Authors: Yongcheng **g, Yezhou Yang, Zunlei Feng, **gwen Ye, Yizhou Yu, Mingli Song

    Abstract: The seminal work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNNs) in creating artistic imagery by separating and recombining image content and style. This process of using CNNs to render a content image in different styles is referred to as Neural Style Transfer (NST). Since then, NST has become a trending topic both in academic literature and industrial applications.… ▽ More

    Submitted 30 October, 2018; v1 submitted 11 May, 2017; originally announced May 2017.

    Comments: Project page: https://github.com/yc**g/Neural-Style-Transfer-Papers