Search | arXiv e-print repository

Conformal Shield: A Novel Adversarial Attack Detection Framework for Automatic Modulation Classification

Authors: Tailai Wen, Da Ke, Xiang Wang, Zhitao Huang

Abstract: Deep learning algorithms have become an essential component in the field of cognitive radio, especially playing a pivotal role in automatic modulation classification. However, Deep learning also present risks and vulnerabilities. Despite their outstanding classification performance, they exhibit fragility when confronted with meticulously crafted adversarial examples, posing potential risks to the… ▽ More Deep learning algorithms have become an essential component in the field of cognitive radio, especially playing a pivotal role in automatic modulation classification. However, Deep learning also present risks and vulnerabilities. Despite their outstanding classification performance, they exhibit fragility when confronted with meticulously crafted adversarial examples, posing potential risks to the reliability of modulation recognition results. Addressing this issue, this letter pioneers the development of an intelligent modulation classification framework based on conformal theory, named the Conformal Shield, aimed at detecting the presence of adversarial examples in unknown signals and assessing the reliability of recognition results. Utilizing conformal map** from statistical learning theory, introduces a custom-designed Inconsistency Soft-solution Set, enabling multiple validity assessments of the recognition outcomes. Experimental results demonstrate that the Conformal Shield maintains robust detection performance against a variety of typical adversarial sample attacks in the received signals under different perturbation-to-signal power ratio conditions. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2312.11814 [pdf]

Study on electromagnetically induced transparency effects in Dirac and VO$_2$ hybrid material structure

Authors: Di Ke, Xie Meng, Xia Hua Rong, Cheng An Yu, Liu Yu, Du Jia Jia

Abstract: In this paper, we present a metamaterial structure of Dirac and vanadium dioxide and investigate its optical properties using the finite-difference time-domain (FDTD) technique. Using the phase transition feature of vanadium dioxide, the design can realize active tuning of the PIT effect at terahertz frequency, thereby converting from a single PIT to a double PIT. When VO$_2$ is in the insulating… ▽ More In this paper, we present a metamaterial structure of Dirac and vanadium dioxide and investigate its optical properties using the finite-difference time-domain (FDTD) technique. Using the phase transition feature of vanadium dioxide, the design can realize active tuning of the PIT effect at terahertz frequency, thereby converting from a single PIT to a double PIT. When VO$_2$ is in the insulating state, the structure is symmetric to obtain a single-band PIT effect; When VO$_2$ is in the metallic state, the structure turns asymmetric to realize a dual-band PIT effect. This design provides a reference direction for the design of actively tunable metamaterials. Additionally, it is discovered that the transparent window's resonant frequency and the Dirac material's Fermi level in this structure have a somewhat linear relationship. In addition, the structure achieves superior refractive index sensitivity in the terahertz band, surpassing 1 THz/RIU. Consequently, the concept exhibits encouraging potential for application in refractive index sensors and optical switches. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10358 [pdf, other]

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Authors: Yayue Deng, **long Xue, Yukang Jia, Qifei Li, Yichen Han, Feng** Wang, Yingming Gao, Dengfeng Ke, Ya Li

Abstract: Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a con… ▽ More Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a contrastive learning-based CSS framework, CONCSS. Within this framework, we define an innovative pretext task specific to CSS that enables the model to perform self-supervised learning on unlabeled conversational datasets to boost the model's context understanding. Additionally, we introduce a sampling strategy for negative sample augmentation to enhance context vectors' discriminability. This is the first attempt to integrate contrastive learning into CSS. We conduct ablation studies on different contrastive learning strategies and comprehensive experiments in comparison with prior CSS systems. Results demonstrate that the synthesized speech from our proposed method exhibits more contextually appropriate and sensitive prosody. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 5 pages, 2 figures, 3 tables, Accepted by ICASSP 2024

arXiv:2310.05402 [pdf]

Challenges for density functional theory in simulating metal-metal singlet bonding: a case study of dimerized VO2

Authors: Yubo Zhang, Da Ke, Junxiong Wu, Chutong Zhang, Baichen Lin, Zuhuang Chen, John P. Perdew, Jianwei Sun

Abstract: VO2 is renowned for its electric transition from an insulating monoclinic (M1) phase characterized by V-V dimerized structures, to a metallic rutile (R) phase above 340 Kelvin. This transition is accompanied by a magnetic change: the M1 phase exhibits a non-magnetic spin-singlet state, while the R phase exhibits a state with local magnetic moments. Simultaneous simulation of the structural, electr… ▽ More VO2 is renowned for its electric transition from an insulating monoclinic (M1) phase characterized by V-V dimerized structures, to a metallic rutile (R) phase above 340 Kelvin. This transition is accompanied by a magnetic change: the M1 phase exhibits a non-magnetic spin-singlet state, while the R phase exhibits a state with local magnetic moments. Simultaneous simulation of the structural, electric, and magnetic properties of this compound is of fundamental importance, but the M1 phase alone has posed a significant challenge to density functional theory (DFT). In this study, we show none of the commonly used DFT functionals, including those combined with on-site Hubbard U to better treat 3d electrons, can accurately predict the V-V dimer length. The spin-restricted method tends to overestimate the strength of the V-V bonds, resulting in a small V-V bond length. Conversely, the spin-symmetry-breaking method exhibits the opposite trends. Each bond-calculation method underscores one of the two contentious mechanisms, i.e., Peierls or Mott, involved in the metal-insulator transition in VO2. To elucidate the challenges encountered in DFT, we also employ an effective Hamiltonian that integrates one-dimensional magnetic sites, thereby revealing the inherent difficulties linked with the DFT computations. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 14 pages, 6 figures

arXiv:2306.02593 [pdf, other]

doi 10.1109/ISCSLP57327.2022.10037822

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis

Authors: Dengfeng Ke, Yayue Deng, Yukang Jia, **long Xue, Qi Luo, Ya Li, Jianqing Sun, Jiaen Liang, Binghuai Lin

Abstract: Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skip**, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate… ▽ More Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skip**, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6%, compared with 11.8% for baseline system. Moreover, nearly 60% subjects prefer to the speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: 5 pages, 3 figures, Published in: 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)

arXiv:2303.15865 [pdf]

Chloride Ion Erosion of Pre-Stressed Concrete Bridges in Cold Regions

Authors: Hongtao Cui, Yi Zhuo, Dongyuan Ke, Zhonglong Li, Shunlong Li

Abstract: The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process i… ▽ More The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process in concrete. In this study, the pre-stressed concrete bridge retired in the cold region was taken as the research object, and the specimens from the whole bridge are obtained by the method of core drilling sampling. The concentration of chloride ion was measured at different depths of the specimens. The process of chloride ion erosion was simulated in two-dimensional space through COMSOL multi-physical field simulation, and compared with the measured results. The simulation method proposed in this paper has good reliability and accuracy. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2208.05228 [pdf]

Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy

Authors: Ijaz Gul, Changyue Liu, Yuan Xi, Zhicheng Du, Shiyao Zhai, Zhengyang Lei, Chen Qun, Muhammad Akmal Raheem, Qian He, Zhang Haihui, Canyang Zhang, Runming Wang, Sanyang Han, Du Ke, Peiwu Qin

Abstract: Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems. Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start dat… ▽ More Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems. Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start date until June, 2022, by use of the terms "monkeypox virus" or "poxvirus" along with "diagnosis"; "PCR"; "real-time PCR"; "LAMP"; "RPA"; "immunoassay"; "reemergence"; "biothreat"; "endemic", and "multi-country outbreak" and also, by tracking citations of the relevant papers. The most relevant articles are included in the review. Results Our literature review shows that PCR is the gold standard method for MPXV detection. In addition, loop-mediated isothermal amplification (LAMP) and recombinase polymerase amplification (RPA) have been reported as alternatives to PCR. Immunodiagnostics, whole particle detection, and image-based detection are the non-nucleic acid-based MPXV detection modalities. Conclusions PCR is easy to leverage and adapt for a quick response to an outbreak, but the PCR-based MPXV detection approaches may not be suitable for marginalized settings. Limited progress has been made towards innovations in MPXV diagnostics, providing room for the development of novel detection techniques for this virus. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: 36 pages, 5 figures, 1 table

arXiv:2206.07289 [pdf, other]

Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Authors: Linkai Peng, Yingming Gao, Binghuai Lin, Dengfeng Ke, Yanlu Xie, **song Zhang

Abstract: Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced… ▽ More Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced-alignment and extended recognition networks. Recently, some end-to-end based methods attempt to incorporate the prior texts into model training and preliminarily show the effectiveness. However, previous studies mostly consider applying raw attention mechanism to fuse audio representations with text representations, without taking possible text-pronunciation mismatch into account. In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD. We conducted experiments using two publicly available datasets (TIMIT and L2-Arctic) and our best model improved the F1 score from $57.51\%$ to $61.75\%$ compared to the baselines. Besides, we provide a detailed analysis to shed light on the effectiveness of gating mechanism and contrastive learning on MDD. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Rejected by Interspeech2022

arXiv:2112.10162 [pdf, other]

doi 10.1103/PhysRevE.105.044122

Backbone and shortest-path exponents of the two-dimensional $Q$-state Potts model

Authors: Sheng Fang, Da Ke, Wei Zhong, You** Deng

Abstract: We present a Monte Carlo study of the backbone and the shortest-path exponents of the two-dimensional $Q$-state Potts model in the Fortuin-Kasteleyn bond representation. We first use cluster algorithms to simulate the critical Potts model on the square lattice and obtain the backbone exponents $d_{\rm B} = 1.732 \, 0(3)$ and $1.794(2)$ for $Q=2,3$ respectively. However, for large $Q$, the study su… ▽ More We present a Monte Carlo study of the backbone and the shortest-path exponents of the two-dimensional $Q$-state Potts model in the Fortuin-Kasteleyn bond representation. We first use cluster algorithms to simulate the critical Potts model on the square lattice and obtain the backbone exponents $d_{\rm B} = 1.732 \, 0(3)$ and $1.794(2)$ for $Q=2,3$ respectively. However, for large $Q$, the study suffers from serious critical slowing down and slowly converging finite-size corrections. To overcome these difficulties, we consider the O$(n)$ loop model on the honeycomb lattice in the densely packed phase, which is regarded to correspond to the critical Potts model with $Q=n^2$. With a highly efficient cluster algorithm, we determine from domains enclosed by the loops $d_{\rm B} =1.643\,39(5), 1.732\,27(8), 1.793\,8(3), 1.838\,4(5), 1.875\,3(6)$ for $Q = 1, 2, 3, 2 \! + \! \sqrt{3}, 4$, respectively, and $d_{\rm min} = 1.094\,5(2), 1.067\,5(3), 1.047\,5(3), 1.032\,2(4)$ for $Q=2,3, 2+\sqrt{3}, 4$ respectively. Our estimates significantly improve over the existing results for both $d_{\rm B}$ and $d_{\rm min}$. Finally, by studying finite-size corrections in backbone-related quantities, we conjecture an exact formula as a function of $n$ for the leading correction exponent. △ Less

Submitted 19 April, 2022; v1 submitted 19 December, 2021; originally announced December 2021.

Comments: 14 pages, 9 figures

Journal ref: Phys. Rev. E 105, 044122(2022)

arXiv:2108.03008 [pdf, other]

An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures

Authors: Dengfeng Ke, Yuxing Lu, Xudong Liu, Yanyan Xu, **g Sun, Cheng-Hao Cai

Abstract: With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing… ▽ More With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis. We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information, and the trained models can produce smooth, clear and natural singing voice that is close to real human voice. As the models work in the end-to-end manner, they allow users who are not domain experts to directly produce singing voice by arranging pitches, lyrics and beats. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: 27 pages, 4 figures, 5 tables

arXiv:2105.02509 [pdf, other]

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Authors: Dengfeng Ke, **song Zhang, Yanlu Xie, Yanyan Xu, Binghuai Lin

Abstract: Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA get… ▽ More Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with ReLU. Finally, BLSTM in PHASEN is replaced with Conv2d operation and the phase stream is simplified. With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4.30, the PESQ score of 3.07 and the COVL score of 3.73. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2104.08428 [pdf, other]

A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques

Authors: Kaiqi Fu, Jones Lin, Dengfeng Ke, Yanlu Xie, **song Zhang, Binghuai Lin

Abstract: Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is… ▽ More Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments on L2-ARCTIC, and our best performance improved from 49.29% to 56.08% in F-measure metric compared to the CNN-RNN-CTC model. △ Less

Submitted 16 April, 2021; originally announced April 2021.

Comments: Submitted to INTERSPEECH2021

arXiv:2009.12475 [pdf, ps, other]

Extending Zeckendorf's Theorem to a Non-constant Recurrence and the Zeckendorf Game on this Non-constant Recurrence Relation

Authors: Elżbieta Bołdyriew, Anna Cusenza, Linglong Dai, Pei Ding, Aidan Dunkelberg, John Haviland, Kate Huffman, Dianhui Ke, Daniel Kleber, Jason Kuretski, John Lentfer, Tianhao Luo, Steven J. Miller, Clayton Mizgerd, Vashisth Tiwari, **gkai Ye, Yunhao Zhang, Xiaoyan Zheng, Weiduo Zhu

Abstract: Zeckendorf's Theorem states that every positive integer can be uniquely represented as a sum of non-adjacent Fibonacci numbers, indexed from $1, 2, 3, 5,\ldots$. This has been generalized by many authors, in particular to constant coefficient fixed depth linear recurrences with positive (or in some cases non-negative) coefficients. In this work we extend this result to a recurrence with non-consta… ▽ More Zeckendorf's Theorem states that every positive integer can be uniquely represented as a sum of non-adjacent Fibonacci numbers, indexed from $1, 2, 3, 5,\ldots$. This has been generalized by many authors, in particular to constant coefficient fixed depth linear recurrences with positive (or in some cases non-negative) coefficients. In this work we extend this result to a recurrence with non-constant coefficients, $a_{n+1} = n a_{n} + a_{n-1}$. The decomposition law becomes every $m$ has a unique decomposition as $\sum s_i a_i$ with $s_i \le i$, where if $s_i = i$ then $s_{i-1} = 0$. Similar to Zeckendorf's original proof, we use the greedy algorithm. We show that almost all the gaps between summands, as $n$ approaches infinity, are of length zero, and give a heuristic that the distribution of the number of summands tends to a Gaussian. Furthermore, we build a game based upon this recurrence relation, generalizing a game on the Fibonacci numbers. Given a fixed integer $n$ and an initial decomposition of $n= na_1$, the players alternate by using moves related to the recurrence relation, and whoever moves last wins. We show that the game is finite and ends at the unique decomposition of $n$, and that either player can win in a two-player game. We find the strategy to attain the shortest game possible, and the length of this shortest game. Then we show that in this generalized game when there are more than three players, no player has the winning strategy. Lastly, we demonstrate how one player in the two-player game can force the game to progress to their advantage. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: 21 pages, 1 figure, from Zeckendorf Polymath REU and the Eureka Program

arXiv:2009.09510 [pdf, ps, other]

Bounds on Zeckendorf Games

Authors: Anna Cusenza, Aiden Dunkelberg, Kate Huffman, Dianhui Ke, Micah McClatchey, Steven J. Miller, Clayton Mizgerd, Vashisth Tiwari, **gkai Ye, Xiaoyan Zheng

Abstract: Zeckendorf proved that every positive integer $n$ can be written uniquely as the sum of non-adjacent Fibonacci numbers. We use this decomposition to construct a two-player game. Given a fixed integer $n$ and an initial decomposition of $n=n F_1$, the two players alternate by using moves related to the recurrence relation $F_{n+1}=F_n+F_{n-1}$, and whoever moves last wins. The game always terminate… ▽ More Zeckendorf proved that every positive integer $n$ can be written uniquely as the sum of non-adjacent Fibonacci numbers. We use this decomposition to construct a two-player game. Given a fixed integer $n$ and an initial decomposition of $n=n F_1$, the two players alternate by using moves related to the recurrence relation $F_{n+1}=F_n+F_{n-1}$, and whoever moves last wins. The game always terminates in the Zeckendorf decomposition; depending on the choice of moves the length of the game and the winner can vary, though for $n\ge 2$ there is a non-constructive proof that Player 2 has a winning strategy. Initially the lower bound of the length of a game was order $n$ (and known to be sharp) while the upper bound was of size $n \log n$. Recent work decreased the upper bound to of size $n$, but with a larger constant than was conjectured. We improve the upper bound and obtain the sharp bound of $\frac{\sqrt{5}+3}{2}\ n - IZ(n) - \frac{1+\sqrt{5}}{2}Z(n)$, which is of order $n$ as $Z(n)$ is the number of terms in the Zeckendorf decomposition of $n$ and $IZ(n)$ is the sum of indices in the Zeckendorf decomposition of $n$ (which are at most of sizes $\log n$ and $\log^2 n$ respectively). We also introduce a greedy algorithm that realizes the upper bound, and show that the longest game on any $n$ is achieved by applying splitting moves whenever possible. △ Less

Submitted 20 September, 2020; originally announced September 2020.

Comments: 15 pages, from Zeckendorf Polymath REU

arXiv:2009.03708 [pdf, ps, other]

Winning Strategy for the Multiplayer and Multialliance Zeckendorf Games

Authors: Anna Cusenza, Aidan Dunkelberg, Kate Huffman, Dianhui Ke, Daniel Kleber, Steven J. Miller, Clayton Mizgerd, Vashisth Tiwari, **gkai Ye, Xiaoyan Zheng

Abstract: Edouard Zeckendorf proved that every positive integer $n$ can be uniquely written \cite{Ze} as the sum of non-adjacent Fibonacci numbers, known as the Zeckendorf decomposition. Based on Zeckendorf's decomposition, we have the Zeckendorf game for multiple players. We show that when the Zeckendorf game has at least $3$ players, none of the players have a winning strategy for $n\geq 5$. Then we exten… ▽ More Edouard Zeckendorf proved that every positive integer $n$ can be uniquely written \cite{Ze} as the sum of non-adjacent Fibonacci numbers, known as the Zeckendorf decomposition. Based on Zeckendorf's decomposition, we have the Zeckendorf game for multiple players. We show that when the Zeckendorf game has at least $3$ players, none of the players have a winning strategy for $n\geq 5$. Then we extend the multi-player game to the multi-alliance game, finding some interesting situations in which no alliance has a winning strategy. This includes the two-alliance game, and some cases in which one alliance always has a winning strategy. %We examine what alliances, or combinations of players, can win, and what size they have to be in order to do so. We also find necessary structural constraints on what alliances our method of proof can show to be winning. Furthermore, we find some alliance structures which must have winning strategies. %We also extend the Generalized Zeckendorf game from $2$-players to multiple players. We find that when the game has $3$ players, player $2$ never has a winning strategy for any significantly large $n$. We also find that when the game has at least $4$ players, no player has a winning strategy for any significantly large $n$. △ Less

Submitted 20 October, 2020; v1 submitted 8 September, 2020; originally announced September 2020.

Comments: 11 pages, from Zeckendorf Polymath REU; new version addresses minor typos, table of contents removed, inclusion of MSC subject code

arXiv:2006.14563 [pdf, other]

doi 10.1109/ICPR48806.2021.9412749

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

Authors: Yongqiang Dou, Haocheng Yang, Maolin Yang, Yanyan Xu, Dengfeng Ke

Abstract: It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this pap… ▽ More It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M. △ Less

Submitted 17 January, 2023; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: The 25th International Conference on Pattern Recognition (ICPR2020)

arXiv:2006.09045 [pdf]

Effect of Cold Sintering Process (CSP) on the Electro-Chemo-Mechanical Properties of Gd-doped Ceria (GDC)

Authors: Ahsanul Kabir, Daoyao Ke, Salvatore Grasso, Benoit Merle, Vincenzo Esposito

Abstract: In this report, the effect of the cold sintering process (CSP) on the electro-chemo-mechanical properties of 10 mol% Gd-doped ceria (GDC) is investigated. High purity nanoscale GDC powder is sintered via a cold sintering process (CSP) in pure water followed by post-annealing at 1000 °C. The resultant CSP ceramics exhibits high relative density (~92%) with an ultrafine grain size of ~100 nm. This s… ▽ More In this report, the effect of the cold sintering process (CSP) on the electro-chemo-mechanical properties of 10 mol% Gd-doped ceria (GDC) is investigated. High purity nanoscale GDC powder is sintered via a cold sintering process (CSP) in pure water followed by post-annealing at 1000 °C. The resultant CSP ceramics exhibits high relative density (~92%) with an ultrafine grain size of ~100 nm. This sample illustrates comparable electrochemical properties at intermediate/high temperatures and electromechanical properties at room temperature to the sample prepared via conventional firing, i.e. sintering in the air at 1450 °C. Moreover, a large creep constant as well as a low elastic modulus and hardness are also observed in the CSP sample. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:2005.10803 [pdf, other]

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

Authors: Wang Dai, **song Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai Lin, Yanlu Xie

Abstract: Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for forman… ▽ More Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN). △ Less

Submitted 8 August, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

Comments: Accepted by Interspeech 2020

arXiv:1904.08138 [pdf, other]

Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis

Authors: Feiyang Chen, Ziqian Luo, Yanyan Xu, Dengfeng Ke

Abstract: Sentiment analysis, mostly based on text, has been rapidly develo** in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion st… ▽ More Sentiment analysis, mostly based on text, has been rapidly develo** in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition. △ Less

Submitted 11 December, 2019; v1 submitted 17 April, 2019; originally announced April 2019.

Comments: Accepted by AAAI2020 Workshop: AffCon2020

arXiv:1811.01244 [pdf, ps, other]

Regularity and stability analysis for a class of semilinear nonlocal differential equations in Hilbert spaces

Authors: Tran Dinh Ke, Nguyen Nhu Thang, Lam Tran Phuong Thuy

Abstract: We deal with a class of semilinear nonlocal differential equations in Hilbert spaces which is a general model for some anomalous diffusion equations. By using the theory of integral equations with completely positive kernel together with local estimates, some existence, regularity and stability results are established. An application to nonlocal partial differential equations is shown to demonstra… ▽ More We deal with a class of semilinear nonlocal differential equations in Hilbert spaces which is a general model for some anomalous diffusion equations. By using the theory of integral equations with completely positive kernel together with local estimates, some existence, regularity and stability results are established. An application to nonlocal partial differential equations is shown to demonstrate our abstract results. △ Less

Submitted 6 December, 2018; v1 submitted 3 November, 2018; originally announced November 2018.

arXiv:1805.01357 [pdf, ps, other]

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Authors: Bin Liu, Shuai Nie, Ya** Zhang, Dengfeng Ke, Shan Liang, Wenju Liu1

Abstract: In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone a… ▽ More In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively. △ Less

Submitted 2 May, 2018; originally announced May 2018.

arXiv:1710.10403 [pdf, other]

doi 10.1007/s10489-018-1266-3

Trainable back-propagated functional transfer matrices

Authors: Cheng-Hao Cai, Yanyan Xu, Dengfeng Ke, Kaile Su, **g Sun

Abstract: Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then s… ▽ More Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits. △ Less

Submitted 28 October, 2017; originally announced October 2017.

Comments: 39 pages, 4 figures, submitted as a journal article

Journal ref: Appl. Intell. (2018)

arXiv:1708.05878 [pdf, ps, other]

Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams

Authors: Sibo Zhang, Yuan Cheng, Deyuan Ke

Abstract: The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challeng… ▽ More The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challenging. A robust and efficient cloud-based real-time local event detection software system would benefit various aspects in the real-life society, from shop** recommendation for customer service providers to disaster alarming for emergency departments. We use the preliminary research GeoBurst as a starting point, which proposed a novel method to detect local events. GeoBurst+ leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract related tweets to form candidate events. It further summarises the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. We mainly implement a website demonstration system Event-Radar with an improved algorithm to show the real-time local events online for public interests. Better still, as the query window shifts, our method can update the event list with little time cost, thus achieving continuous monitoring of the stream. △ Less

Submitted 5 October, 2017; v1 submitted 19 August, 2017; originally announced August 2017.

Comments: 10 pages

arXiv:1706.09995 [pdf]

Stochastic Dynamic Optimal Power Flow in Distribution Network with Distributed Renewable Energy and Battery Energy Storage

Authors: Chenghui Tang, Jian Xu, Yuanzhang Sun, Siyang Liao, De** Ke, Xiong Li

Abstract: The penetration of distributed renewable energy (DRE) greatly raises the risk of distribution network operation such as peak shaving and voltage stability. Battery energy storage (BES) has been widely accepted as the most potential application to cope with the challenge of high penetration of DRE. To cope with the uncertainties and variability of DRE, a stochastic day-ahead dynamic optimal power f… ▽ More The penetration of distributed renewable energy (DRE) greatly raises the risk of distribution network operation such as peak shaving and voltage stability. Battery energy storage (BES) has been widely accepted as the most potential application to cope with the challenge of high penetration of DRE. To cope with the uncertainties and variability of DRE, a stochastic day-ahead dynamic optimal power flow (DOPF) and its algorithm are proposed. The overall economy is achieved by fully considering the DRE, BES, electricity purchasing and active power losses. The rainflow algorithm-based cycle counting method of BES is incorporated in the DOPF model to capture the cell degradation, greatly extending the expected BES lifetime and achieving a better economy. DRE scenarios are generated to consider the uncertainties and correlations based on the Copula theory. To solve the DOPF model, we propose a Lagrange relaxation-based algorithm, which has a significantly reduced complexity with respect to the existing techniques. For this reason, the proposed algorithm enables much more scenarios incorporated in the DOPF model and better captures the DRE uncertainties and correlations. Finally, numerical studies for the day-ahead DOPF in the IEEE 123-node test feeder are presented to demonstrate the merits of the proposed method. Results show that the actual BES life expectancy of the proposed model has increased to 4.89 times compared with the traditional ones. The problems caused by DRE are greatly alleviated by fully capturing the uncertainties and correlations with the proposed method. △ Less

Submitted 29 June, 2017; originally announced June 2017.

arXiv:1704.07503 [pdf, other]

doi 10.1016/j.bica.2018.07.004

Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks

Authors: Cheng-Hao Cai, Dengfeng Ke, Yanyan Xu, Kaile Su

Abstract: There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic… ▽ More There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals. △ Less

Submitted 24 April, 2017; originally announced April 2017.

Comments: 8 pages, 7 figures

ACM Class: I.2.0; I.2.3; I.2.4; I.2.6; I.2.8; I.5.0; I.5.1; I.5.2; I.5.4; F.4.1

arXiv:nlin/0603016 [pdf]

Lattice complexity and fine graining of symbolic sequence

Authors: Da-Guan Ke, Hong Zhang, Qin-Ye Tong

Abstract: A new complexity measure named as Lattice Complexity is presented for finite symbolic sequences. This measure is based on the symbolic dynamics of one-dimensional iterative maps and Lempel-Ziv Complexity. To make Lattice Complexity distinguishable from Lempel-Ziv Complexity, an approach called fine-graining process is also proposed. When the control parameter fine-graining order is small enough,… ▽ More A new complexity measure named as Lattice Complexity is presented for finite symbolic sequences. This measure is based on the symbolic dynamics of one-dimensional iterative maps and Lempel-Ziv Complexity. To make Lattice Complexity distinguishable from Lempel-Ziv Complexity, an approach called fine-graining process is also proposed. When the control parameter fine-graining order is small enough, the two measures are almost equal. While the order increases, the difference between the two measures becomes more and more significant. Applying Lattice Complexity to logistic map with a proper order, we find that the sequences that are regarded as complex are roughly at the edges of chaotic regions. Further derived properties of the two measures concerning the fine-graining process are also discussed. △ Less

Submitted 5 April, 2008; v1 submitted 9 March, 2006; originally announced March 2006.

Comments: 16 page, 8 figures,a revised English version of a article published in Chinese

Journal ref: D. G. Ke, H. Zhang, Q. Y. Tong, Acta Physica Sinica 2005 54: 534

arXiv:nlin/0505052 [pdf]

Easily Adaptable Complexity Measure for Finite Time Series

Authors: Da-Guan Ke, Qin-Ye Tong

Abstract: We present a complexity measure for any finite time series. This measure has invariance under any monotonic transformation of the time series, has a degree of robustness against noise, and has the adaptability of satisfying almost all the widely accepted but conflicting criteria for complexity measurements. Surprisingly, the measure is developed from Kolmogorov complexity, which is traditionally… ▽ More We present a complexity measure for any finite time series. This measure has invariance under any monotonic transformation of the time series, has a degree of robustness against noise, and has the adaptability of satisfying almost all the widely accepted but conflicting criteria for complexity measurements. Surprisingly, the measure is developed from Kolmogorov complexity, which is traditionally believed to represent only randomness and to satisfy one criterion to the exclusion of the others. For familiar iterative systems, our treatment may imply a heuristic approach to transforming symbolic dynamics into permutation dynamics and vice versa. △ Less

Submitted 25 November, 2008; v1 submitted 23 May, 2005; originally announced May 2005.

Comments: 15 page, 3 figures, 1 table; modifications making cruicial points clearer and improve readibility; had been completely rewritten

Journal ref: Phys. Rev. E 77, 066215 (2008)

Showing 1–27 of 27 results for author: Ke, D