-
Conformal Shield: A Novel Adversarial Attack Detection Framework for Automatic Modulation Classification
Authors:
Tailai Wen,
Da Ke,
Xiang Wang,
Zhitao Huang
Abstract:
Deep learning algorithms have become an essential component in the field of cognitive radio, especially playing a pivotal role in automatic modulation classification. However, Deep learning also present risks and vulnerabilities. Despite their outstanding classification performance, they exhibit fragility when confronted with meticulously crafted adversarial examples, posing potential risks to the…
▽ More
Deep learning algorithms have become an essential component in the field of cognitive radio, especially playing a pivotal role in automatic modulation classification. However, Deep learning also present risks and vulnerabilities. Despite their outstanding classification performance, they exhibit fragility when confronted with meticulously crafted adversarial examples, posing potential risks to the reliability of modulation recognition results. Addressing this issue, this letter pioneers the development of an intelligent modulation classification framework based on conformal theory, named the Conformal Shield, aimed at detecting the presence of adversarial examples in unknown signals and assessing the reliability of recognition results. Utilizing conformal map** from statistical learning theory, introduces a custom-designed Inconsistency Soft-solution Set, enabling multiple validity assessments of the recognition outcomes. Experimental results demonstrate that the Conformal Shield maintains robust detection performance against a variety of typical adversarial sample attacks in the received signals under different perturbation-to-signal power ratio conditions.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Study on electromagnetically induced transparency effects in Dirac and VO$_2$ hybrid material structure
Authors:
Di Ke,
Xie Meng,
Xia Hua Rong,
Cheng An Yu,
Liu Yu,
Du Jia Jia
Abstract:
In this paper, we present a metamaterial structure of Dirac and vanadium dioxide and investigate its optical properties using the finite-difference time-domain (FDTD) technique. Using the phase transition feature of vanadium dioxide, the design can realize active tuning of the PIT effect at terahertz frequency, thereby converting from a single PIT to a double PIT. When VO$_2$ is in the insulating…
▽ More
In this paper, we present a metamaterial structure of Dirac and vanadium dioxide and investigate its optical properties using the finite-difference time-domain (FDTD) technique. Using the phase transition feature of vanadium dioxide, the design can realize active tuning of the PIT effect at terahertz frequency, thereby converting from a single PIT to a double PIT. When VO$_2$ is in the insulating state, the structure is symmetric to obtain a single-band PIT effect; When VO$_2$ is in the metallic state, the structure turns asymmetric to realize a dual-band PIT effect. This design provides a reference direction for the design of actively tunable metamaterials. Additionally, it is discovered that the transparent window's resonant frequency and the Dirac material's Fermi level in this structure have a somewhat linear relationship. In addition, the structure achieves superior refractive index sensitivity in the terahertz band, surpassing 1 THz/RIU. Consequently, the concept exhibits encouraging potential for application in refractive index sensors and optical switches.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis
Authors:
Yayue Deng,
**long Xue,
Yukang Jia,
Qifei Li,
Yichen Han,
Feng** Wang,
Yingming Gao,
Dengfeng Ke,
Ya Li
Abstract:
Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a con…
▽ More
Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context comprehension, context representation still lacks effective representation capabilities and context-sensitive discriminability. In this paper, we introduce a contrastive learning-based CSS framework, CONCSS. Within this framework, we define an innovative pretext task specific to CSS that enables the model to perform self-supervised learning on unlabeled conversational datasets to boost the model's context understanding. Additionally, we introduce a sampling strategy for negative sample augmentation to enhance context vectors' discriminability. This is the first attempt to integrate contrastive learning into CSS. We conduct ablation studies on different contrastive learning strategies and comprehensive experiments in comparison with prior CSS systems. Results demonstrate that the synthesized speech from our proposed method exhibits more contextually appropriate and sensitive prosody.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Challenges for density functional theory in simulating metal-metal singlet bonding: a case study of dimerized VO2
Authors:
Yubo Zhang,
Da Ke,
Junxiong Wu,
Chutong Zhang,
Baichen Lin,
Zuhuang Chen,
John P. Perdew,
Jianwei Sun
Abstract:
VO2 is renowned for its electric transition from an insulating monoclinic (M1) phase characterized by V-V dimerized structures, to a metallic rutile (R) phase above 340 Kelvin. This transition is accompanied by a magnetic change: the M1 phase exhibits a non-magnetic spin-singlet state, while the R phase exhibits a state with local magnetic moments. Simultaneous simulation of the structural, electr…
▽ More
VO2 is renowned for its electric transition from an insulating monoclinic (M1) phase characterized by V-V dimerized structures, to a metallic rutile (R) phase above 340 Kelvin. This transition is accompanied by a magnetic change: the M1 phase exhibits a non-magnetic spin-singlet state, while the R phase exhibits a state with local magnetic moments. Simultaneous simulation of the structural, electric, and magnetic properties of this compound is of fundamental importance, but the M1 phase alone has posed a significant challenge to density functional theory (DFT). In this study, we show none of the commonly used DFT functionals, including those combined with on-site Hubbard U to better treat 3d electrons, can accurately predict the V-V dimer length. The spin-restricted method tends to overestimate the strength of the V-V bonds, resulting in a small V-V bond length. Conversely, the spin-symmetry-breaking method exhibits the opposite trends. Each bond-calculation method underscores one of the two contentious mechanisms, i.e., Peierls or Mott, involved in the metal-insulator transition in VO2. To elucidate the challenges encountered in DFT, we also employ an effective Hamiltonian that integrates one-dimensional magnetic sites, thereby revealing the inherent difficulties linked with the DFT computations.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
Authors:
Dengfeng Ke,
Yayue Deng,
Yukang Jia,
**long Xue,
Qi Luo,
Ya Li,
Jianqing Sun,
Jiaen Liang,
Binghuai Lin
Abstract:
Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skip**, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate…
▽ More
Regressive Text-to-Speech (TTS) system utilizes attention mechanism to generate alignment between text and acoustic feature sequence. Alignment determines synthesis robustness (e.g, the occurence of skip**, repeating, and collapse) and rhythm via duration control. However, current attention algorithms used in speech synthesis cannot control rhythm using external duration information to generate natural speech while ensuring robustness. In this study, we propose Rhythm-controllable Attention (RC-Attention) based on Tracotron2, which improves robustness and naturalness simultaneously. Proposed attention adopts a trainable scalar learned from four kinds of information to achieve rhythm control, which makes rhythm control more robust and natural, even when synthesized sentences are extremely longer than training corpus. We use word errors counting and AB preference test to measure robustness of proposed method and naturalness of synthesized speech, respectively. Results shows that RC-Attention has the lowest word error rate of nearly 0.6%, compared with 11.8% for baseline system. Moreover, nearly 60% subjects prefer to the speech synthesized with RC-Attention to that with Forward Attention, because the former has more natural rhythm.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Chloride Ion Erosion of Pre-Stressed Concrete Bridges in Cold Regions
Authors:
Hongtao Cui,
Yi Zhuo,
Dongyuan Ke,
Zhonglong Li,
Shunlong Li
Abstract:
The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process i…
▽ More
The erosion of chloride ions in concrete bridges will accelerate the corrosion of reinforcement, which is an important reason for the decline of bridge durability. The erosion process of chloride ion, especially deicing salt solution in cold regions, is complex and has many influencing factors. It is very important to use accurate and effective methods to analyze the chloride ion erosion process in concrete. In this study, the pre-stressed concrete bridge retired in the cold region was taken as the research object, and the specimens from the whole bridge are obtained by the method of core drilling sampling. The concentration of chloride ion was measured at different depths of the specimens. The process of chloride ion erosion was simulated in two-dimensional space through COMSOL multi-physical field simulation, and compared with the measured results. The simulation method proposed in this paper has good reliability and accuracy.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Current and perspective sensing methods for monkeypox virus: a reemerging zoonosis in its infancy
Authors:
Ijaz Gul,
Changyue Liu,
Yuan Xi,
Zhicheng Du,
Shiyao Zhai,
Zhengyang Lei,
Chen Qun,
Muhammad Akmal Raheem,
Qian He,
Zhang Haihui,
Canyang Zhang,
Runming Wang,
Sanyang Han,
Du Ke,
Peiwu Qin
Abstract:
Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems.
Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start dat…
▽ More
Objectives The review is dedicated to evaluate the current monkeypox virus (MPXV) detection methods, discuss their pros and cons, and provide recommended solutions to the problems.
Methods The literature for this review is identified through searches in PubMed, Web of Science, Google Scholar, ResearchGate, and Science Direct advanced search for articles published in English without any start date until June, 2022, by use of the terms "monkeypox virus" or "poxvirus" along with "diagnosis"; "PCR"; "real-time PCR"; "LAMP"; "RPA"; "immunoassay"; "reemergence"; "biothreat"; "endemic", and "multi-country outbreak" and also, by tracking citations of the relevant papers. The most relevant articles are included in the review.
Results Our literature review shows that PCR is the gold standard method for MPXV detection. In addition, loop-mediated isothermal amplification (LAMP) and recombinase polymerase amplification (RPA) have been reported as alternatives to PCR. Immunodiagnostics, whole particle detection, and image-based detection are the non-nucleic acid-based MPXV detection modalities.
Conclusions PCR is easy to leverage and adapt for a quick response to an outbreak, but the PCR-based MPXV detection approaches may not be suitable for marginalized settings. Limited progress has been made towards innovations in MPXV diagnostics, providing room for the development of novel detection techniques for this virus.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Text-Aware End-to-end Mispronunciation Detection and Diagnosis
Authors:
Linkai Peng,
Yingming Gao,
Binghuai Lin,
Dengfeng Ke,
Yanlu Xie,
**song Zhang
Abstract:
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced…
▽ More
Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced-alignment and extended recognition networks. Recently, some end-to-end based methods attempt to incorporate the prior texts into model training and preliminarily show the effectiveness. However, previous studies mostly consider applying raw attention mechanism to fuse audio representations with text representations, without taking possible text-pronunciation mismatch into account. In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD. We conducted experiments using two publicly available datasets (TIMIT and L2-Arctic) and our best model improved the F1 score from $57.51\%$ to $61.75\%$ compared to the baselines. Besides, we provide a detailed analysis to shed light on the effectiveness of gating mechanism and contrastive learning on MDD.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Backbone and shortest-path exponents of the two-dimensional $Q$-state Potts model
Authors:
Sheng Fang,
Da Ke,
Wei Zhong,
You** Deng
Abstract:
We present a Monte Carlo study of the backbone and the shortest-path exponents of the two-dimensional $Q$-state Potts model in the Fortuin-Kasteleyn bond representation. We first use cluster algorithms to simulate the critical Potts model on the square lattice and obtain the backbone exponents $d_{\rm B} = 1.732 \, 0(3)$ and $1.794(2)$ for $Q=2,3$ respectively. However, for large $Q$, the study su…
▽ More
We present a Monte Carlo study of the backbone and the shortest-path exponents of the two-dimensional $Q$-state Potts model in the Fortuin-Kasteleyn bond representation. We first use cluster algorithms to simulate the critical Potts model on the square lattice and obtain the backbone exponents $d_{\rm B} = 1.732 \, 0(3)$ and $1.794(2)$ for $Q=2,3$ respectively. However, for large $Q$, the study suffers from serious critical slowing down and slowly converging finite-size corrections. To overcome these difficulties, we consider the O$(n)$ loop model on the honeycomb lattice in the densely packed phase, which is regarded to correspond to the critical Potts model with $Q=n^2$. With a highly efficient cluster algorithm, we determine from domains enclosed by the loops $d_{\rm B} =1.643\,39(5), 1.732\,27(8), 1.793\,8(3), 1.838\,4(5), 1.875\,3(6)$ for $Q = 1, 2, 3, 2 \! + \! \sqrt{3}, 4$, respectively, and $d_{\rm min} = 1.094\,5(2), 1.067\,5(3), 1.047\,5(3), 1.032\,2(4)$ for $Q=2,3, 2+\sqrt{3}, 4$ respectively. Our estimates significantly improve over the existing results for both $d_{\rm B}$ and $d_{\rm min}$. Finally, by studying finite-size corrections in backbone-related quantities, we conjecture an exact formula as a function of $n$ for the leading correction exponent.
△ Less
Submitted 19 April, 2022; v1 submitted 19 December, 2021;
originally announced December 2021.
-
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures
Authors:
Dengfeng Ke,
Yuxing Lu,
Xudong Liu,
Yanyan Xu,
**g Sun,
Cheng-Hao Cai
Abstract:
With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing…
▽ More
With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis. We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information, and the trained models can produce smooth, clear and natural singing voice that is close to real human voice. As the models work in the end-to-end manner, they allow users who are not domain experts to directly produce singing voice by arranging pitches, lyrics and beats.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU
Authors:
Dengfeng Ke,
**song Zhang,
Yanlu Xie,
Yanyan Xu,
Binghuai Lin
Abstract:
Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA get…
▽ More
Single channel speech enhancement is a challenging task in speech community. Recently, various neural networks based methods have been applied to speech enhancement. Among these models, PHASEN and T-GSA achieve state-of-the-art performances on the publicly opened VoiceBank+DEMAND corpus. Both of the models reach the COVL score of 3.62. PHASEN achieves the highest CSIG score of 4.21 while T-GSA gets the highest PESQ score of 3.06. However, both of these two models are very large. The contradiction between the model performance and the model size is hard to reconcile. In this paper, we introduce three kinds of techniques to shrink the PHASEN model and improve the performance. Firstly, seperable polling attention is proposed to replace the frequency transformation blocks in PHASEN. Secondly, global layer normalization followed with PReLU is used to replace batch normalization followed with ReLU. Finally, BLSTM in PHASEN is replaced with Conv2d operation and the phase stream is simplified. With all these modifications, the size of the PHASEN model is shrunk from 33M parameters to 5M parameters, while the performance on VoiceBank+DEMAND is improved to the CSIG score of 4.30, the PESQ score of 3.07 and the COVL score of 3.73.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
A Full Text-Dependent End to End Mispronunciation Detection and Diagnosis with Easy Data Augmentation Techniques
Authors:
Kaiqi Fu,
Jones Lin,
Dengfeng Ke,
Yanlu Xie,
**song Zhang,
Binghuai Lin
Abstract:
Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is…
▽ More
Recently, end-to-end mispronunciation detection and diagnosis (MD&D) systems has become a popular alternative to greatly simplify the model-building process of conventional hybrid DNN-HMM systems by representing complicated modules with a single deep network architecture. In this paper, in order to utilize the prior text in the end-to-end structure, we present a novel text-dependent model which is difference with sed-mdd, the model achieves a fully end-to-end system by aligning the audio with the phoneme sequences of the prior text inside the model through the attention mechanism. Moreover, the prior text as input will be a problem of imbalance between positive and negative samples in the phoneme sequence. To alleviate this problem, we propose three simple data augmentation methods, which effectively improve the ability of model to capture mispronounced phonemes. We conduct experiments on L2-ARCTIC, and our best performance improved from 49.29% to 56.08% in F-measure metric compared to the CNN-RNN-CTC model.
△ Less
Submitted 16 April, 2021;
originally announced April 2021.
-
Extending Zeckendorf's Theorem to a Non-constant Recurrence and the Zeckendorf Game on this Non-constant Recurrence Relation
Authors:
Elżbieta Bołdyriew,
Anna Cusenza,
Linglong Dai,
Pei Ding,
Aidan Dunkelberg,
John Haviland,
Kate Huffman,
Dianhui Ke,
Daniel Kleber,
Jason Kuretski,
John Lentfer,
Tianhao Luo,
Steven J. Miller,
Clayton Mizgerd,
Vashisth Tiwari,
**gkai Ye,
Yunhao Zhang,
Xiaoyan Zheng,
Weiduo Zhu
Abstract:
Zeckendorf's Theorem states that every positive integer can be uniquely represented as a sum of non-adjacent Fibonacci numbers, indexed from $1, 2, 3, 5,\ldots$. This has been generalized by many authors, in particular to constant coefficient fixed depth linear recurrences with positive (or in some cases non-negative) coefficients. In this work we extend this result to a recurrence with non-consta…
▽ More
Zeckendorf's Theorem states that every positive integer can be uniquely represented as a sum of non-adjacent Fibonacci numbers, indexed from $1, 2, 3, 5,\ldots$. This has been generalized by many authors, in particular to constant coefficient fixed depth linear recurrences with positive (or in some cases non-negative) coefficients. In this work we extend this result to a recurrence with non-constant coefficients, $a_{n+1} = n a_{n} + a_{n-1}$. The decomposition law becomes every $m$ has a unique decomposition as $\sum s_i a_i$ with $s_i \le i$, where if $s_i = i$ then $s_{i-1} = 0$. Similar to Zeckendorf's original proof, we use the greedy algorithm. We show that almost all the gaps between summands, as $n$ approaches infinity, are of length zero, and give a heuristic that the distribution of the number of summands tends to a Gaussian. Furthermore, we build a game based upon this recurrence relation, generalizing a game on the Fibonacci numbers. Given a fixed integer $n$ and an initial decomposition of $n= na_1$, the players alternate by using moves related to the recurrence relation, and whoever moves last wins. We show that the game is finite and ends at the unique decomposition of $n$, and that either player can win in a two-player game. We find the strategy to attain the shortest game possible, and the length of this shortest game. Then we show that in this generalized game when there are more than three players, no player has the winning strategy. Lastly, we demonstrate how one player in the two-player game can force the game to progress to their advantage.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Bounds on Zeckendorf Games
Authors:
Anna Cusenza,
Aiden Dunkelberg,
Kate Huffman,
Dianhui Ke,
Micah McClatchey,
Steven J. Miller,
Clayton Mizgerd,
Vashisth Tiwari,
**gkai Ye,
Xiaoyan Zheng
Abstract:
Zeckendorf proved that every positive integer $n$ can be written uniquely as the sum of non-adjacent Fibonacci numbers. We use this decomposition to construct a two-player game. Given a fixed integer $n$ and an initial decomposition of $n=n F_1$, the two players alternate by using moves related to the recurrence relation $F_{n+1}=F_n+F_{n-1}$, and whoever moves last wins. The game always terminate…
▽ More
Zeckendorf proved that every positive integer $n$ can be written uniquely as the sum of non-adjacent Fibonacci numbers. We use this decomposition to construct a two-player game. Given a fixed integer $n$ and an initial decomposition of $n=n F_1$, the two players alternate by using moves related to the recurrence relation $F_{n+1}=F_n+F_{n-1}$, and whoever moves last wins. The game always terminates in the Zeckendorf decomposition; depending on the choice of moves the length of the game and the winner can vary, though for $n\ge 2$ there is a non-constructive proof that Player 2 has a winning strategy.
Initially the lower bound of the length of a game was order $n$ (and known to be sharp) while the upper bound was of size $n \log n$. Recent work decreased the upper bound to of size $n$, but with a larger constant than was conjectured. We improve the upper bound and obtain the sharp bound of $\frac{\sqrt{5}+3}{2}\ n - IZ(n) - \frac{1+\sqrt{5}}{2}Z(n)$, which is of order $n$ as $Z(n)$ is the number of terms in the Zeckendorf decomposition of $n$ and $IZ(n)$ is the sum of indices in the Zeckendorf decomposition of $n$ (which are at most of sizes $\log n$ and $\log^2 n$ respectively). We also introduce a greedy algorithm that realizes the upper bound, and show that the longest game on any $n$ is achieved by applying splitting moves whenever possible.
△ Less
Submitted 20 September, 2020;
originally announced September 2020.
-
Winning Strategy for the Multiplayer and Multialliance Zeckendorf Games
Authors:
Anna Cusenza,
Aidan Dunkelberg,
Kate Huffman,
Dianhui Ke,
Daniel Kleber,
Steven J. Miller,
Clayton Mizgerd,
Vashisth Tiwari,
**gkai Ye,
Xiaoyan Zheng
Abstract:
Edouard Zeckendorf proved that every positive integer $n$ can be uniquely written \cite{Ze} as the sum of non-adjacent Fibonacci numbers, known as the Zeckendorf decomposition. Based on Zeckendorf's decomposition, we have the Zeckendorf game for multiple players. We show that when the Zeckendorf game has at least $3$ players, none of the players have a winning strategy for $n\geq 5$. Then we exten…
▽ More
Edouard Zeckendorf proved that every positive integer $n$ can be uniquely written \cite{Ze} as the sum of non-adjacent Fibonacci numbers, known as the Zeckendorf decomposition. Based on Zeckendorf's decomposition, we have the Zeckendorf game for multiple players. We show that when the Zeckendorf game has at least $3$ players, none of the players have a winning strategy for $n\geq 5$. Then we extend the multi-player game to the multi-alliance game, finding some interesting situations in which no alliance has a winning strategy. This includes the two-alliance game, and some cases in which one alliance always has a winning strategy.
%We examine what alliances, or combinations of players, can win, and what size they have to be in order to do so. We also find necessary structural constraints on what alliances our method of proof can show to be winning. Furthermore, we find some alliance structures which must have winning strategies.
%We also extend the Generalized Zeckendorf game from $2$-players to multiple players. We find that when the game has $3$ players, player $2$ never has a winning strategy for any significantly large $n$. We also find that when the game has at least $4$ players, no player has a winning strategy for any significantly large $n$.
△ Less
Submitted 20 October, 2020; v1 submitted 8 September, 2020;
originally announced September 2020.
-
Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection
Authors:
Yongqiang Dou,
Haocheng Yang,
Maolin Yang,
Yanyan Xu,
Dengfeng Ke
Abstract:
It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this pap…
▽ More
It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.
△ Less
Submitted 17 January, 2023; v1 submitted 25 June, 2020;
originally announced June 2020.
-
Effect of Cold Sintering Process (CSP) on the Electro-Chemo-Mechanical Properties of Gd-doped Ceria (GDC)
Authors:
Ahsanul Kabir,
Daoyao Ke,
Salvatore Grasso,
Benoit Merle,
Vincenzo Esposito
Abstract:
In this report, the effect of the cold sintering process (CSP) on the electro-chemo-mechanical properties of 10 mol% Gd-doped ceria (GDC) is investigated. High purity nanoscale GDC powder is sintered via a cold sintering process (CSP) in pure water followed by post-annealing at 1000 °C. The resultant CSP ceramics exhibits high relative density (~92%) with an ultrafine grain size of ~100 nm. This s…
▽ More
In this report, the effect of the cold sintering process (CSP) on the electro-chemo-mechanical properties of 10 mol% Gd-doped ceria (GDC) is investigated. High purity nanoscale GDC powder is sintered via a cold sintering process (CSP) in pure water followed by post-annealing at 1000 °C. The resultant CSP ceramics exhibits high relative density (~92%) with an ultrafine grain size of ~100 nm. This sample illustrates comparable electrochemical properties at intermediate/high temperatures and electromechanical properties at room temperature to the sample prepared via conventional firing, i.e. sintering in the air at 1450 °C. Moreover, a large creep constant as well as a low elastic modulus and hardness are also observed in the CSP sample.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism
Authors:
Wang Dai,
**song Zhang,
Yingming Gao,
Wei Wei,
Dengfeng Ke,
Binghuai Lin,
Yanlu Xie
Abstract:
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for forman…
▽ More
Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model was validated on the open access formant database VTR. The experiment showed that our proposed model was easy to converge and achieved an overall mean absolute percent error (MAPE) of 8.2% on speech-labeled frames, compared to three competitive baselines of 9.4% (LSTM), 9.1% (Bi-LSTM) and 8.9% (TCN).
△ Less
Submitted 8 August, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Complementary Fusion of Multi-Features and Multi-Modalities in Sentiment Analysis
Authors:
Feiyang Chen,
Ziqian Luo,
Yanyan Xu,
Dengfeng Ke
Abstract:
Sentiment analysis, mostly based on text, has been rapidly develo** in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion st…
▽ More
Sentiment analysis, mostly based on text, has been rapidly develo** in the last decade and has attracted widespread attention in both academia and industry. However, the information in the real world usually comes from multiple modalities, such as audio and text. Therefore, in this paper, based on audio and text, we consider the task of multimodal sentiment analysis and propose a novel fusion strategy including both multi-feature fusion and multi-modality fusion to improve the accuracy of audio-text sentiment analysis. We call it the DFF-ATMF (Deep Feature Fusion - Audio and Text Modality Fusion) model, which consists of two parallel branches, the audio modality based branch and the text modality based branch. Its core mechanisms are the fusion of multiple feature vectors and multiple modality attention. Experiments on the CMU-MOSI dataset and the recently released CMU-MOSEI dataset, both collected from YouTube for sentiment analysis, show the very competitive results of our DFF-ATMF model. Furthermore, by virtue of attention weight distribution heatmaps, we also demonstrate the deep features learned by using DFF-ATMF are complementary to each other and robust. Surprisingly, DFF-ATMF also achieves new state-of-the-art results on the IEMOCAP dataset, indicating that the proposed fusion strategy also has a good generalization ability for multimodal emotion recognition.
△ Less
Submitted 11 December, 2019; v1 submitted 17 April, 2019;
originally announced April 2019.
-
Regularity and stability analysis for a class of semilinear nonlocal differential equations in Hilbert spaces
Authors:
Tran Dinh Ke,
Nguyen Nhu Thang,
Lam Tran Phuong Thuy
Abstract:
We deal with a class of semilinear nonlocal differential equations in Hilbert spaces which is a general model for some anomalous diffusion equations. By using the theory of integral equations with completely positive kernel together with local estimates, some existence, regularity and stability results are established. An application to nonlocal partial differential equations is shown to demonstra…
▽ More
We deal with a class of semilinear nonlocal differential equations in Hilbert spaces which is a general model for some anomalous diffusion equations. By using the theory of integral equations with completely positive kernel together with local estimates, some existence, regularity and stability results are established. An application to nonlocal partial differential equations is shown to demonstrate our abstract results.
△ Less
Submitted 6 December, 2018; v1 submitted 3 November, 2018;
originally announced November 2018.
-
Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
Authors:
Bin Liu,
Shuai Nie,
Ya** Zhang,
Dengfeng Ke,
Shan Liang,
Wenju Liu1
Abstract:
In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone a…
▽ More
In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.
△ Less
Submitted 2 May, 2018;
originally announced May 2018.
-
Trainable back-propagated functional transfer matrices
Authors:
Cheng-Hao Cai,
Yanyan Xu,
Dengfeng Ke,
Kaile Su,
**g Sun
Abstract:
Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then s…
▽ More
Connections between nodes of fully connected neural networks are usually represented by weight matrices. In this article, functional transfer matrices are introduced as alternatives to the weight matrices: Instead of using real weights, a functional transfer matrix uses real functions with trainable parameters to represent connections between nodes. Multiple functional transfer matrices are then stacked together with bias vectors and activations to form deep functional transfer neural networks. These neural networks can be trained within the framework of back-propagation, based on a revision of the delta rules and the error transmission rule for functional connections. In experiments, it is demonstrated that the revised rules can be used to train a range of functional connections: 20 different functions are applied to neural networks with up to 10 hidden layers, and most of them gain high test accuracies on the MNIST database. It is also demonstrated that a functional transfer matrix with a memory function can roughly memorise a non-cyclical sequence of 400 digits.
△ Less
Submitted 28 October, 2017;
originally announced October 2017.
-
Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams
Authors:
Sibo Zhang,
Yuan Cheng,
Deyuan Ke
Abstract:
The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challeng…
▽ More
The local event detection is to use posting messages with geotags on social networks to reveal the related ongoing events and their locations. Recent studies have demonstrated that the geo-tagged tweet stream serves as an unprecedentedly valuable source for local event detection. Nevertheless, how to effectively extract local events from large geo-tagged tweet streams in real time remains challenging. A robust and efficient cloud-based real-time local event detection software system would benefit various aspects in the real-life society, from shop** recommendation for customer service providers to disaster alarming for emergency departments. We use the preliminary research GeoBurst as a starting point, which proposed a novel method to detect local events. GeoBurst+ leverages a novel cross-modal authority measure to identify several pivots in the query window. Such pivots reveal different geo-topical activities and naturally attract related tweets to form candidate events. It further summarises the continuous stream and compares the candidates against the historical summaries to pinpoint truly interesting local events. We mainly implement a website demonstration system Event-Radar with an improved algorithm to show the real-time local events online for public interests. Better still, as the query window shifts, our method can update the event list with little time cost, thus achieving continuous monitoring of the stream.
△ Less
Submitted 5 October, 2017; v1 submitted 19 August, 2017;
originally announced August 2017.
-
Stochastic Dynamic Optimal Power Flow in Distribution Network with Distributed Renewable Energy and Battery Energy Storage
Authors:
Chenghui Tang,
Jian Xu,
Yuanzhang Sun,
Siyang Liao,
De** Ke,
Xiong Li
Abstract:
The penetration of distributed renewable energy (DRE) greatly raises the risk of distribution network operation such as peak shaving and voltage stability. Battery energy storage (BES) has been widely accepted as the most potential application to cope with the challenge of high penetration of DRE. To cope with the uncertainties and variability of DRE, a stochastic day-ahead dynamic optimal power f…
▽ More
The penetration of distributed renewable energy (DRE) greatly raises the risk of distribution network operation such as peak shaving and voltage stability. Battery energy storage (BES) has been widely accepted as the most potential application to cope with the challenge of high penetration of DRE. To cope with the uncertainties and variability of DRE, a stochastic day-ahead dynamic optimal power flow (DOPF) and its algorithm are proposed. The overall economy is achieved by fully considering the DRE, BES, electricity purchasing and active power losses. The rainflow algorithm-based cycle counting method of BES is incorporated in the DOPF model to capture the cell degradation, greatly extending the expected BES lifetime and achieving a better economy. DRE scenarios are generated to consider the uncertainties and correlations based on the Copula theory. To solve the DOPF model, we propose a Lagrange relaxation-based algorithm, which has a significantly reduced complexity with respect to the existing techniques. For this reason, the proposed algorithm enables much more scenarios incorporated in the DOPF model and better captures the DRE uncertainties and correlations. Finally, numerical studies for the day-ahead DOPF in the IEEE 123-node test feeder are presented to demonstrate the merits of the proposed method. Results show that the actual BES life expectancy of the proposed model has increased to 4.89 times compared with the traditional ones. The problems caused by DRE are greatly alleviated by fully capturing the uncertainties and correlations with the proposed method.
△ Less
Submitted 29 June, 2017;
originally announced June 2017.
-
Learning of Human-like Algebraic Reasoning Using Deep Feedforward Neural Networks
Authors:
Cheng-Hao Cai,
Dengfeng Ke,
Yanyan Xu,
Kaile Su
Abstract:
There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic…
▽ More
There is a wide gap between symbolic reasoning and deep learning. In this research, we explore the possibility of using deep learning to improve symbolic reasoning. Briefly, in a reasoning system, a deep feedforward neural network is used to guide rewriting processes after learning from algebraic reasoning examples produced by humans. To enable the neural network to recognise patterns of algebraic expressions with non-deterministic sizes, reduced partial trees are used to represent the expressions. Also, to represent both top-down and bottom-up information of the expressions, a centralisation technique is used to improve the reduced partial trees. Besides, symbolic association vectors and rule application records are used to improve the rewriting processes. Experimental results reveal that the algebraic reasoning examples can be accurately learnt only if the feedforward neural network has enough hidden layers. Also, the centralisation technique, the symbolic association vectors and the rule application records can reduce error rates of reasoning. In particular, the above approaches have led to 4.6% error rate of reasoning on a dataset of linear equations, differentials and integrals.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
Lattice complexity and fine graining of symbolic sequence
Authors:
Da-Guan Ke,
Hong Zhang,
Qin-Ye Tong
Abstract:
A new complexity measure named as Lattice Complexity is presented for finite symbolic sequences. This measure is based on the symbolic dynamics of one-dimensional iterative maps and Lempel-Ziv Complexity. To make Lattice Complexity distinguishable from Lempel-Ziv Complexity, an approach called fine-graining process is also proposed. When the control parameter fine-graining order is small enough,…
▽ More
A new complexity measure named as Lattice Complexity is presented for finite symbolic sequences. This measure is based on the symbolic dynamics of one-dimensional iterative maps and Lempel-Ziv Complexity. To make Lattice Complexity distinguishable from Lempel-Ziv Complexity, an approach called fine-graining process is also proposed. When the control parameter fine-graining order is small enough, the two measures are almost equal. While the order increases, the difference between the two measures becomes more and more significant. Applying Lattice Complexity to logistic map with a proper order, we find that the sequences that are regarded as complex are roughly at the edges of chaotic regions. Further derived properties of the two measures concerning the fine-graining process are also discussed.
△ Less
Submitted 5 April, 2008; v1 submitted 9 March, 2006;
originally announced March 2006.
-
Easily Adaptable Complexity Measure for Finite Time Series
Authors:
Da-Guan Ke,
Qin-Ye Tong
Abstract:
We present a complexity measure for any finite time series. This measure has invariance under any monotonic transformation of the time series, has a degree of robustness against noise, and has the adaptability of satisfying almost all the widely accepted but conflicting criteria for complexity measurements. Surprisingly, the measure is developed from Kolmogorov complexity, which is traditionally…
▽ More
We present a complexity measure for any finite time series. This measure has invariance under any monotonic transformation of the time series, has a degree of robustness against noise, and has the adaptability of satisfying almost all the widely accepted but conflicting criteria for complexity measurements. Surprisingly, the measure is developed from Kolmogorov complexity, which is traditionally believed to represent only randomness and to satisfy one criterion to the exclusion of the others. For familiar iterative systems, our treatment may imply a heuristic approach to transforming symbolic dynamics into permutation dynamics and vice versa.
△ Less
Submitted 25 November, 2008; v1 submitted 23 May, 2005;
originally announced May 2005.