Search | arXiv e-print repository

ADD 2023: the Second Audio Deepfake Detection Challenge

Authors: Jiangyan Yi, Jianhua Tao, Ruibo Fu, Xinrui Yan, Chenglong Wang, Tao Wang, Chu Yuan Zhang, Xiaohui Zhang, Yan Zhao, Yong Ren, Le Xu, Junzuo Zhou, Hao Gu, Zhengqi Wen, Shan Liang, Zheng Lian, Shuai Nie, Haizhou Li

Abstract: Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on s… ▽ More Audio deepfake detection is an emerging topic in the artificial intelligence community. The second Audio Deepfake Detection Challenge (ADD 2023) aims to spur researchers around the world to build new innovative technologies that can further accelerate and foster research on detecting and analyzing deepfake speech utterances. Different from previous challenges (e.g. ADD 2022), ADD 2023 focuses on surpassing the constraints of binary real/fake classification, and actually localizing the manipulated intervals in a partially fake speech as well as pinpointing the source responsible for generating any fake audio. Furthermore, ADD 2023 includes more rounds of evaluation for the fake audio game sub-challenge. The ADD 2023 challenge includes three subchallenges: audio fake game (FG), manipulation region location (RL) and deepfake algorithm recognition (AR). This paper describes the datasets, evaluation metrics, and protocols. Some findings are also reported in audio deepfake detection tasks. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2202.08433 [pdf, ps, other]

ADD 2022: the First Audio Deep Synthesis Detection Challenge

Authors: Jiangyan Yi, Ruibo Fu, Jianhua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Abstract: Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake gam… ▽ More Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks. △ Less

Submitted 26 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

Comments: Accepted by ICASSP 2022

arXiv:2112.13187 [pdf, ps, other]

TeraHertz Band Communication: An Old Problem Revisited and Research Directions for the Next Decade

Authors: Ian F. Akyildiz, Chong Han, Zhifeng Hu, Shuai Nie, Josep M. Jornet

Abstract: Terahertz (THz) band communications are envisioned as a key technology for 6G and Beyond. As a fundamental wireless infrastructure, THz communication can boost abundant promising applications. In 2014, our team published two comprehensive roadmaps for the development and progress of THz communication networks [1], [2], which helped the research community to start research on this subject afterward… ▽ More Terahertz (THz) band communications are envisioned as a key technology for 6G and Beyond. As a fundamental wireless infrastructure, THz communication can boost abundant promising applications. In 2014, our team published two comprehensive roadmaps for the development and progress of THz communication networks [1], [2], which helped the research community to start research on this subject afterwards. The topic of THz communications became very important and appealing to the research community due to 6G wireless systems design and development in recent years. Many papers are getting published covering different aspects of wireless systems using the THz band. With this paper, our aim is looking back to the last decade and revisiting the old problems and pointing out what has been achieved in the research community so far. Furthermore, in this paper, open challenges and new research directions still to be investigated for the THz band communication systems are presented, by covering diverse topics ranging from devices, channel behavior, communication and networking, to physical testbeds and demonstration systems. The key aspects presented in this paper will enable THz communications as a pillar of 6G and Beyond wireless systems in the next decade. △ Less

Submitted 26 April, 2022; v1 submitted 25 December, 2021; originally announced December 2021.

Comments: To appear in IEEE Transactions on Communications, 2022

arXiv:2008.07742 [pdf, other]

UDC 2020 Challenge on Image Restoration of Under-Display Camera: Methods and Results

Authors: Yuqian Zhou, Michael Kwan, Kyle Tolentino, Neil Emerton, Sehoon Lim, Tim Large, Lijiang Fu, Zhihong Pan, Baopu Li, Qirui Yang, Yihao Liu, Jigang Tang, Tao Ku, Shibin Ma, Bingnan Hu, Jiarong Wang, Densen Puthussery, Hrishikesh P S, Melvin Kuriakose, Jiji C V, Varun Sundar, Sumanth Hegde, Divya Kothandaraman, Kaushik Mitra, Akashdeep Jassal , et al. (20 additional authors not shown)

Abstract: This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, ei… ▽ More This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, eight and nine teams submitted the results during the testing phase for each track. The results in the paper are state-of-the-art restoration performance of Under-Display Camera Restoration. Datasets and paper are available at https://yzhouas.github.io/projects/UDC/udc.html. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: 15 pages

arXiv:2003.10270 [pdf, other]

Mobility-aware Beam Steering in Metasurface-based Programmable Wireless Environments

Authors: Christos Liaskos, Shuai Nie, Ageliki Tsioliaridou, Andreas Pitsillides, Sotiris Ioannidis, Ian Akyildiz

Abstract: Programmable wireless environments (PWEs) utilize electromagnetic metasurfaces to transform wireless propagation into a software-controlled resource. In this work we study the effects of user device mobility on the efficiency of PWEs. An analytical model is proposed, which describes the potential misalignment between user-emitted waves and the active PWE configuration, and can constitute the basis… ▽ More Programmable wireless environments (PWEs) utilize electromagnetic metasurfaces to transform wireless propagation into a software-controlled resource. In this work we study the effects of user device mobility on the efficiency of PWEs. An analytical model is proposed, which describes the potential misalignment between user-emitted waves and the active PWE configuration, and can constitute the basis for studying queuing problems in PWEs. Subsequently, a novel, beam steering approach is proposed which can effectively mitigate the misalignment effects. Ray-tracing-based simulations evaluate the proposed scheme. △ Less

Submitted 23 March, 2020; originally announced March 2020.

Comments: In proceedings of IEEE ICASSP 2020. This work was funded by the European Union via the Horizon 2020: Future Emerging Topics call (FETOPEN-RIA), grant EU736876, project VISORSURF (http://visorsurf.eu)

arXiv:1907.00037 [pdf, other]

3D Channel Modeling and Characterization for Hypersurface Empowered Indoor Environment at 60 GHz Millimeter-Wave Band

Authors: Rashi Mehrotra, Rafay Iqbal Ansari, Alexandros Pitilakis, Shuai Nie, Christos Liaskos, Nikolaos V. Kantartzis, Andreas Pitsillides

Abstract: This paper proposes a three-dimensional (3D) communication channel model for an indoor environment considering the effect of the Hypersurface. The Hypersurface is a software controlled intelligent metasurface, which can be used to manipulate electromagnetic waves, as for example for non-specular reflection and full absorption. Thus it can control the im**ing rays from a transmitter towards a rec… ▽ More This paper proposes a three-dimensional (3D) communication channel model for an indoor environment considering the effect of the Hypersurface. The Hypersurface is a software controlled intelligent metasurface, which can be used to manipulate electromagnetic waves, as for example for non-specular reflection and full absorption. Thus it can control the im**ing rays from a transmitter towards a receiver location in both LOS and NLOS paths, e.g. to combat distance and improve wireless connectivity. We focus on the 60 GHz mmWave frequency band due to its increasing significance in 5G/6G networks and evaluate the effect of Hypersurface in an indoor environment in terms of attenuation coefficients related to the Hypersurface reflection and absorption functionalities, using CST simulation, a 3D electromagnetic simulator of high frequency components. To highlight the benefits of Hypersurface coated walls versus plain walls, we use the derived Hypersurface 3D channel model and a custom 3D ray-tracing simulator for plain walls considering a typical indoor scenario for different Tx-Rx location and separation distances. △ Less

Submitted 28 June, 2019; originally announced July 2019.

Comments: Accepted

arXiv:1904.07958 [pdf, other]

Intelligent Environments based on Ultra-Massive MIMO Platforms for Wireless Communication in Millimeter Wave and Terahertz Bands

Authors: Shuai Nie, Josep M. Jornet, Ian F. Akyildiz

Abstract: Millimeter-wave (30-300 GHz) and Terahertz-band communications (0.3-10 THz) are envisioned as key wireless technologies to satisfy the demand for Terabit-per-second (Tbps) links in the 5G and beyond eras. The very large available bandwidth in this ultra-broadband frequency range comes at the cost of a very high propagation loss, which combined with the low power of mm-wave and THz-band transceiver… ▽ More Millimeter-wave (30-300 GHz) and Terahertz-band communications (0.3-10 THz) are envisioned as key wireless technologies to satisfy the demand for Terabit-per-second (Tbps) links in the 5G and beyond eras. The very large available bandwidth in this ultra-broadband frequency range comes at the cost of a very high propagation loss, which combined with the low power of mm-wave and THz-band transceivers limits the communication distance and data-rates. In this paper, the concept of intelligent communication environments enabled by Ultra-Massive MIMO platforms is proposed to increase the communication distance and data-rates at mm-wave and THz-band frequencies. An end-to-end physical model is developed by taking into account the capabilities of novel intelligent plasmonic antenna arrays which can operate in transmission, reception, reflection and waveguiding, as well as the peculiarities of the mm-wave and THz-band multi-path channel. Based on the developed model, extensive quantitative results for different scenarios are provided to illustrate the performance improvements in terms of both achievable distance and data-rate in Ultra-Massive MIMO environments. △ Less

Submitted 16 April, 2019; originally announced April 2019.

arXiv:1902.04682 [pdf, other]

doi 10.1109/MCOM.2018.1700928

Combating the Distance Problem in the Millimeter Wave and Terahertz Frequency Bands

Authors: Ian F. Akyildiz, Chong Han, Shuai Nie

Abstract: In the millimeter wave (30-300 GHz) and Terahertz (0.1-10 THz) frequency bands, high spreading loss and molecular absorption often limit the signal transmission distance and coverage range. In this paper, four directions to tackle the crucial problem of distance limitation are investigated, namely, a physical layer distance-aware design, ultra-massive MIMO communication, reflectarrays, and intelli… ▽ More In the millimeter wave (30-300 GHz) and Terahertz (0.1-10 THz) frequency bands, high spreading loss and molecular absorption often limit the signal transmission distance and coverage range. In this paper, four directions to tackle the crucial problem of distance limitation are investigated, namely, a physical layer distance-aware design, ultra-massive MIMO communication, reflectarrays, and intelligent surfaces. Additionally, the potential joint design of these technologies is proposed to combine the benefits and possibly further extend the communication distance. Qualitative analyses and quantitative simulations are provided to illustrate the benefits of the proposed techniques and demonstrate the feasibility of mm-wave and THz band communications up to 100 meters in both line-of-sight and non-line-of-sight areas. △ Less

Submitted 12 February, 2019; originally announced February 2019.

Journal ref: IEEE Communications Magazine, vol. 56, no. 6, pp. 102-108, June 2018

arXiv:1811.00883 [pdf, other]

Deep Segment Attentive Embedding for Duration Robust Speaker Verification

Authors: Bin Liu, Shuai Nie, Ya** Zhang, Shan Liang, Wenju Liu

Abstract: LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify the speaker, which results in a critical mismatch between testing and training. This mismatch degrades the performance of speaker verification, especially when t… ▽ More LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify the speaker, which results in a critical mismatch between testing and training. This mismatch degrades the performance of speaker verification, especially when the durations of training and testing utterances are very different. To alleviate this issue, we propose the deep segment attentive embedding method to learn the unified speaker embeddings for utterances of variable duration. Each utterance is segmented by a sliding window and LSTM is used to extract the embedding of each segment. Instead of only using one local segment, we use the whole utterance to learn the utterance-level embedding by applying an attentive pooling to the embeddings of all segments. Moreover, the similarity loss of segment-level embeddings is introduced to guide the segment attention to focus on the segments with more speaker discriminations, and jointly optimized with the similarity loss of utterance-level embeddings. Systematic experiments on Tongdun and VoxCeleb show that the proposed method significantly improves robustness of duration variant and achieves the relative Equal Error Rate reduction of 50% and 11.54% , respectively. △ Less

Submitted 31 October, 2018; originally announced November 2018.

arXiv:1806.01792 [pdf, other]

A New Wireless Communication Paradigm through Software-controlled Metasurfaces

Authors: Christos Liaskos, Shuai Nie, Ageliki Tsioliaridou, Andreas Pitsillides, Sotiris Ioannidis, Ian Akyildiz

Abstract: Electromagnetic waves undergo multiple uncontrollable alterations as they propagate within a wireless environment. Free space path loss, signal absorption, as well as reflections, refractions and diffractions caused by physical objects within the environment highly affect the performance of wireless communications. Currently, such effects are intractable to account for and are treated as probabili… ▽ More Electromagnetic waves undergo multiple uncontrollable alterations as they propagate within a wireless environment. Free space path loss, signal absorption, as well as reflections, refractions and diffractions caused by physical objects within the environment highly affect the performance of wireless communications. Currently, such effects are intractable to account for and are treated as probabilistic factors. The paper proposes a radically different approach, enabling deterministic, programmable control over the behavior of the wireless environments. The key-enabler is the so-called HyperSurface tile, a novel class of planar meta-materials which can interact with im**ing electromagnetic waves in a controlled manner. The HyperSurface tiles can effectively re-engineer electromagnetic waves, including steering towards any desired direction, full absorption, polarization manipulation and more. Multiple tiles are employed to coat objects such as walls, furniture, overall, any objects in the indoor and outdoor environments. An external software service calculates and deploys the optimal interaction types per tile, to best fit the needs of communicating devices. Evaluation via simulations highlights the potential of the new concept. △ Less

Submitted 4 June, 2018; originally announced June 2018.

Comments: Paper accepted for publication at the IEEE Communications Magazine. This work was funded by the European Union via the Horizon 2020: Future Emerging Topics call (FETOPEN-RIA), grant EU736876, project VISORSURF: HyperSurfaces-A Hardware Platform for Software-driven Functional Metasurfaces (http://www.visorsurf.eu/)

arXiv:1805.06677 [pdf, other]

Realizing Wireless Communication through Software-defined HyperSurface Environments

Authors: Christos Liaskos, Shuai Nie, Ageliki Tsioliaridou, Andreas Pitsillides, Sotiris Ioannidis, Ian Akyildiz

Abstract: Wireless communication environments are unaware of the ongoing data exchange efforts within them. Moreover, their effect on the communication quality is intractable in all but the simplest cases. The present work proposes a new paradigm, where indoor scattering becomes software-defined and, subsequently, optimizable across wide frequency ranges. Moreover, the controlled scattering can surpass natu… ▽ More Wireless communication environments are unaware of the ongoing data exchange efforts within them. Moreover, their effect on the communication quality is intractable in all but the simplest cases. The present work proposes a new paradigm, where indoor scattering becomes software-defined and, subsequently, optimizable across wide frequency ranges. Moreover, the controlled scattering can surpass natural behavior, exemplary overriding Snell's law, reflecting waves towards any custom angle (including negative ones). Thus, path loss and multi-path fading effects can be controlled and mitigated. The core technology of this new paradigm are metasurfaces, planar artificial structures whose effect on im**ing electromagnetic waves is fully defined by their macro-structure. The present study contributes the software-programmable wireless environment model, consisting of several HyperSurface tiles controlled by a central, environment configuration server. HyperSurfaces are a novel class of metasurfaces whose structure and, hence, electromagnetic behavior can be altered and controlled via a software interface. Multiple networked tiles coat indoor objects, allowing fine-grained, customizable reflection, absorption or polarization overall. A central server calculates and deploys the optimal electromagnetic interaction per tile, to the benefit of communicating devices. Realistic simulations using full 3D ray-tracing demonstrate the groundbreaking potential of the proposed approach in 2.4 GHz and 60 GHz frequencies. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: This paper appears at the 19TH IEEE WOWMOM 2018, JUNE 12-15, 2018. (Technical program: http://it.murdoch.edu.au/wowmom2018/technical_program.html) This work was funded by the European Union via the Horizon 2020: Future Emerging Topics call (FETOPEN-RIA), grant EU736876, project VISORSURF (http://www.visorsurf.eu) : HyperSurfaces-A Hardware Platform for Software-driven Functional Metasurfaces

arXiv:1805.01357 [pdf, ps, other]

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Authors: Bin Liu, Shuai Nie, Ya** Zhang, Dengfeng Ke, Shan Liang, Wenju Liu1

Abstract: In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone a… ▽ More In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively. △ Less

Submitted 2 May, 2018; originally announced May 2018.

Showing 1–12 of 12 results for author: Nie, S