Search | arXiv e-print repository

An Attribute Interpolation Method in Speech Synthesis by Model Merging

Authors: Masato Murata, Koichi Miyazaki, Tomoki Koriyama

Abstract: With the development of speech synthesis, recent research has focused on challenging tasks, such as speaker generation and emotion intensity control. Attribute interpolation is a common approach to these tasks. However, most previous methods for attribute interpolation require specific modules or training methods. We propose an attribute interpolation method in speech synthesis by model merging. M… ▽ More With the development of speech synthesis, recent research has focused on challenging tasks, such as speaker generation and emotion intensity control. Attribute interpolation is a common approach to these tasks. However, most previous methods for attribute interpolation require specific modules or training methods. We propose an attribute interpolation method in speech synthesis by model merging. Model merging is a method that creates new parameters by only averaging the parameters of base models. The merged model can generate an output with an intermediate feature of the base models. This method is easily applicable without specific modules or training methods, as it uses only existing trained base models. We merged two text-to-speech models to achieve attribute interpolation and evaluated its performance on speaker generation and emotion intensity control tasks. As a result, our proposed method achieved smooth attribute interpolation while kee** the linguistic content in both tasks. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.16808 [pdf, other]

Exploring the Capability of Mamba in Speech Applications

Authors: Koichi Miyazaki, Yoshiki Masuyama, Masato Murata

Abstract: This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based mod… ▽ More This paper explores the capability of Mamba, a recently proposed architecture based on state space models (SSMs), as a competitive alternative to Transformer-based models. In the speech domain, well-designed Transformer-based models, such as the Conformer and E-Branchformer, have become the de facto standards. Extensive evaluations have demonstrated the effectiveness of these Transformer-based models across a wide range of speech tasks. In contrast, the evaluation of SSMs has been limited to a few tasks, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we compared Mamba with state-of-the-art Transformer variants for various speech applications, including ASR, text-to-speech, spoken language understanding, and speech summarization. Experimental evaluations revealed that Mamba achieves comparable or better performance than Transformer-based models, and demonstrated its efficiency in long-form speech processing. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted at Interspeech 2024

arXiv:2405.20037 [pdf, other]

Linguistic Landscape of Generative AI Perception: A Global Twitter Analysis Across 14 Languages

Authors: Taichi Murayama, Kunihiro Miyazaki, Yasuko Matsubara, Yasushi Sakurai

Abstract: The advent of generative AI tools has had a profound impact on societies globally, transcending geographical boundaries. Understanding these tools' global reception and utilization is crucial for service providers and policymakers in sha** future policies. Therefore, to unravel the perceptions and engagements of individuals within diverse linguistic communities with regard to generative AI tools… ▽ More The advent of generative AI tools has had a profound impact on societies globally, transcending geographical boundaries. Understanding these tools' global reception and utilization is crucial for service providers and policymakers in sha** future policies. Therefore, to unravel the perceptions and engagements of individuals within diverse linguistic communities with regard to generative AI tools, we extensively analyzed over 6.8 million tweets in 14 different languages. Our findings reveal a global trend in the perception of generative AI, accompanied by language-specific nuances. While sentiments toward these tools vary significantly across languages, there is a prevalent positive inclination toward Image tools and a negative one toward Chat tools. Notably, the ban of ChatGPT in Italy led to a sentiment decline and initiated discussions across languages. Furthermore, we established a taxonomy for interactions with chatbots, creating a framework for social analysis underscoring variations in generative AI usage among linguistic communities. We find that the Chinese community predominantly employs chatbots as substitutes for search, while the Italian community tends to present more intricate prompts. Our research provides a robust foundation for further explorations of the social dynamics surrounding generative AI tools and offers invaluable insights for decision-makers in policy, technology, and education. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2402.02150 [pdf, other]

Data-Driven Prediction of Seismic Intensity Distributions Featuring Hybrid Classification-Regression Models

Authors: Koyu Mizutani, Haruki Mitarai, Kakeru Miyazaki, Soichiro Kumano, Toshihiko Yamasaki

Abstract: Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it… ▽ More Earthquakes are among the most immediate and deadly natural disasters that humans face. Accurately forecasting the extent of earthquake damage and assessing potential risks can be instrumental in saving numerous lives. In this study, we developed linear regression models capable of predicting seismic intensity distributions based on earthquake parameters: location, depth, and magnitude. Because it is completely data-driven, it can predict intensity distributions without geographical information. The dataset comprises seismic intensity data from earthquakes that occurred in the vicinity of Japan between 1997 and 2020, specifically containing 1,857 instances of earthquakes with a magnitude of 5.0 or greater, sourced from the Japan Meteorological Agency. We trained both regression and classification models and combined them to take advantage of both to create a hybrid model. The proposed model outperformed commonly used Ground Motion Prediction Equations (GMPEs) in terms of the correlation coefficient, F1 score, and MCC. Furthermore, the proposed model can predict even abnormal seismic intensity distributions, a task at conventional GMPEs often struggle. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2305.09537 [pdf, other]

Public Perception of Generative AI on Twitter: An Empirical Study Based on Occupation and Usage

Authors: Kunihiro Miyazaki, Taichi Murayama, Takayuki Uchiba, Jisun An, Haewoon Kwak

Abstract: The emergence of generative AI has sparked substantial discussions, with the potential to have profound impacts on society in all aspects. As emerging technologies continue to advance, it is imperative to facilitate their proper integration into society, managing expectations and fear. This paper investigates users' perceptions of generative AI using 3M posts on Twitter from January 2019 to March… ▽ More The emergence of generative AI has sparked substantial discussions, with the potential to have profound impacts on society in all aspects. As emerging technologies continue to advance, it is imperative to facilitate their proper integration into society, managing expectations and fear. This paper investigates users' perceptions of generative AI using 3M posts on Twitter from January 2019 to March 2023, especially focusing on their occupation and usage. We find that people across various occupations, not just IT-related ones, show a strong interest in generative AI. The sentiment toward generative AI is generally positive, and remarkably, their sentiments are positively correlated with their exposure to AI. Among occupations, illustrators show exceptionally negative sentiment mainly due to concerns about the unethical usage of artworks in constructing AI. People use ChatGPT in diverse ways, and notably the casual usage in which they "play with" ChatGPT tends to associate with positive sentiments. After the release of ChatGPT, people's interest in AI in general has increased dramatically; however, the topic with the most significant increase and positive sentiment is related to crypto, indicating the hype-worthy characteristics of generative AI. These findings would offer valuable lessons for policymaking on the emergence of new technology and also empirical insights for the considerations of future human-AI symbiosis. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: 11 pages, 11 figures

arXiv:2301.07282 [pdf, other]

The Chance of Winning Election Impacts on Social Media Strategy

Authors: Taichi Murayama, Akira Matsui, Kunihiro Miyazaki, Yasuko Matsubara, Yasushi Sakurai

Abstract: Social media has been a paramount arena for election campaigns for political actors. While many studies have been paying attention to the political campaigns related to partisanship, politicians also can conduct different campaigns according to their chances of winning. Leading candidates, for example, do not behave the same as fringe candidates in their elections, and vice versa. We, however, kno… ▽ More Social media has been a paramount arena for election campaigns for political actors. While many studies have been paying attention to the political campaigns related to partisanship, politicians also can conduct different campaigns according to their chances of winning. Leading candidates, for example, do not behave the same as fringe candidates in their elections, and vice versa. We, however, know little about this difference in social media political campaign strategies according to their odds in elections. We tackle this problem by analyzing candidates' tweets in terms of users, topics, and sentiment of replies. Our study finds that, as their chances of winning increase, candidates narrow the targets they communicate with, from people in general to the electrical districts and specific persons (verified accounts or accounts with many followers). Our study brings new insights into the candidates' campaign strategies through the analysis based on the novel perspective of the candidate's electoral situation. △ Less

Submitted 4 April, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: Accepted by ICWSM 2023

arXiv:2212.02827 [pdf, other]

Political Honeymoon Effect on Social Media: Characterizing Social Media Reaction to the Changes of Prime Minister in Japan

Authors: Kunihiro Miyazaki, Taichi Murayama, Akira Matsui, Masaru Nishikawa, Takayuki Uchiba, Haewoon Kwak, Jisun An

Abstract: New leaders in democratic countries typically enjoy high approval ratings immediately after taking office. This phenomenon is called the honeymoon effect and is regarded as a significant political phenomenon; however, its mechanism remains underexplored. Therefore, this study examines how social media users respond to changes in political leadership in order to better understand the honeymoon effe… ▽ More New leaders in democratic countries typically enjoy high approval ratings immediately after taking office. This phenomenon is called the honeymoon effect and is regarded as a significant political phenomenon; however, its mechanism remains underexplored. Therefore, this study examines how social media users respond to changes in political leadership in order to better understand the honeymoon effect in politics. In particular, we constructed a 15-year Twitter dataset on eight change timings of Japanese prime ministers consisting of 6.6M tweets and analyzed them in terms of sentiments, topics, and users. We found that, while not always, social media tend to show a honeymoon effect at the change timings of prime minister. The study also revealed that sentiment about prime ministers differed by topic, indicating that public expectations vary from one prime minister to another. Furthermore, the user base was largely replaced before and after the change in the prime minister, and their sentiment was also significantly different. The implications of this study would be beneficial for administrative management. △ Less

Submitted 25 February, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Accepted at ACM Web Science Conference 2023 (WebSci'23). 12 pages, 6 figures

arXiv:2210.17098 [pdf, other]

Structured State Space Decoder for Speech Recognition and Synthesis

Authors: Koichi Miyazaki, Masato Murata, Tomoki Koriyama

Abstract: Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e.g., Transformer and Conformer), which are replacing conventional recurrent neural networks. Meanwhile, a structured state space model (S4) has been recently proposed, producing promising results for various long-sequence modeling tasks, including raw speech classification… ▽ More Automatic speech recognition (ASR) systems developed in recent years have shown promising results with self-attention models (e.g., Transformer and Conformer), which are replacing conventional recurrent neural networks. Meanwhile, a structured state space model (S4) has been recently proposed, producing promising results for various long-sequence modeling tasks, including raw speech classification. The S4 model can be trained in parallel, same as the Transformer model. In this study, we applied S4 as a decoder for ASR and text-to-speech (TTS) tasks by comparing it with the Transformer decoder. For the ASR task, our experimental results demonstrate that the proposed model achieves a competitive word error rate (WER) of 1.88%/4.25% on LibriSpeech test-clean/test-other set and a character error rate (CER) of 3.80%/2.63%/2.98% on the CSJ eval1/eval2/eval3 set. Furthermore, the proposed model is more robust than the standard Transformer model, particularly for long-form speech on both the datasets. For the TTS task, the proposed method outperforms the Transformer baseline. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: Submitted to ICASSP 2023

arXiv:2208.07565 [pdf, other]

Prediction of Seismic Intensity Distributions Using Neural Networks

Authors: Koyu Mizutani, Haruki Mitarai, Kakeru Miyazaki, Ryugo Shimamura, Soichiro Kumano, Toshihiko Yamasaki

Abstract: The ground motion prediction equation is commonly used to predict the seismic intensity distribution. However, it is not easy to apply this method to seismic distributions affected by underground plate structures, which are commonly known as abnormal seismic distributions. This study proposes a hybrid of regression and classification approaches using neural networks. The proposed model treats the… ▽ More The ground motion prediction equation is commonly used to predict the seismic intensity distribution. However, it is not easy to apply this method to seismic distributions affected by underground plate structures, which are commonly known as abnormal seismic distributions. This study proposes a hybrid of regression and classification approaches using neural networks. The proposed model treats the distributions as 2-dimensional data like an image. Our method can accurately predict seismic intensity distributions, even abnormal distributions. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 2 pages, 2 figures, IEEE GCCE2022 accepted

arXiv:2204.00910 [pdf, other]

Characterizing Spontaneous Ideation Contest on Social Media: Case Study on the Name Change of Facebook to Meta

Authors: Kunihiro Miyazaki, Takayuki Uchiba, Haewoon Kwak, Jisun An

Abstract: Collecting good ideas is vital for organizations, especially companies, to retain their competitiveness. Social media is gathering attention as a place to extract ideas efficiently; however, the characteristics of ideas and the posters of ideas on social media are underexamined. Thus, this study aims to characterize spontaneous ideation contests among social media users by taking an event of Faceb… ▽ More Collecting good ideas is vital for organizations, especially companies, to retain their competitiveness. Social media is gathering attention as a place to extract ideas efficiently; however, the characteristics of ideas and the posters of ideas on social media are underexamined. Thus, this study aims to characterize spontaneous ideation contests among social media users by taking an event of Facebook's name change to Meta as a case study. As a dataset, we comprehensively collect tweets containing new acronyms of Big Tech companies, which we treat as an "idea" in this work. In the analysis, we especially focus on the diversity of ideas, which would be the main reason for enlisting social media for idea generation. As the main results, we discovered that social media users offered a wider range of ideas than those in mainstream media. The follow-follower network of the users suggested that the users' position on the network is related to the preferred ideas. Additionally, we discovered a link between the amount of user interaction on social media and the diversity of ideas. This study would promote the use of social media as a part of open innovation and co-creation processes in the industry. △ Less

Submitted 14 November, 2022; v1 submitted 2 April, 2022; originally announced April 2022.

Comments: Accepted at IEEE BigData 2022 as a short paper. This arXiv submission is the longer version. 10 pages, 13 figures

arXiv:2203.14242 [pdf, other]

"This is Fake News": Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information

Authors: Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Jisun An, Haewoon Kwak, Kazutoshi Sasahara

Abstract: False information spreads on social media, and fact-checking is a potential countermeasure. However, there is a severe shortage of fact-checkers; an efficient way to scale fact-checking is desperately needed, especially in pandemics like COVID-19. In this study, we focus on spontaneous debunking by social media users, which has been missed in existing research despite its indicated usefulness for… ▽ More False information spreads on social media, and fact-checking is a potential countermeasure. However, there is a severe shortage of fact-checkers; an efficient way to scale fact-checking is desperately needed, especially in pandemics like COVID-19. In this study, we focus on spontaneous debunking by social media users, which has been missed in existing research despite its indicated usefulness for fact-checking and countering false information. Specifically, we characterize the tweets with false information, or fake tweets, that tend to be debunked and Twitter users who often debunk fake tweets. For this analysis, we create a comprehensive dataset of responses to fake tweets, annotate a subset of them, and build a classification model for detecting debunking behaviors. We find that most fake tweets are left undebunked, spontaneous debunking is slower than other forms of responses, and spontaneous debunking exhibits partisanship in political topics. These results provide actionable insights into utilizing spontaneous debunking to scale conventional fact-checking, thereby supplementing existing research from a new perspective. △ Less

Submitted 10 August, 2022; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: Accepted at ICWSM 2023

arXiv:2202.08470 [pdf, other]

doi 10.21437/Interspeech.2021-2218

Acoustic Event Detection with Classifier Chains

Authors: Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

Abstract: This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule… ▽ More This paper proposes acoustic event detection (AED) with classifier chains, a new classifier based on the probabilistic chain rule. The proposed AED with classifier chains consists of a gated recurrent unit and performs iterative binary detection of each event one by one. In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains. Therefore, the proposed method can handle the interdependence among events upon classification, while the conventional AED methods with multiple binary classifiers with a linear layer and sigmoid function have placed an assumption of conditional independence. In the experiments with a real-recording dataset, the proposed method demonstrates its superior AED performance to a relative 14.80% improvement compared to a convolutional recurrent neural network baseline system with the multiple binary classifiers. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 5pages, presented at Interspeech2021

arXiv:2106.09354 [pdf, other]

Retrospective Analysis of Controversial Subtopics on COVID-19 in Japan

Authors: Kunihiro Miyazaki, Takayuki Uchiba, Fujio Toriumi, Kenji Tanaka, Takeshi Sakaki

Abstract: For efficient political decision-making in an emergency situation, a thorough recognition and understanding of the polarized topics is crucial. The cost of unmitigated polarization would be extremely high for the society; therefore, it is desirable to identify the polarizing issues before they become serious. With this in mind, we conducted a retrospective analysis of the polarized subtopics of CO… ▽ More For efficient political decision-making in an emergency situation, a thorough recognition and understanding of the polarized topics is crucial. The cost of unmitigated polarization would be extremely high for the society; therefore, it is desirable to identify the polarizing issues before they become serious. With this in mind, we conducted a retrospective analysis of the polarized subtopics of COVID-19 to obtain insights for future policymaking. To this end, we first propose a framework to comprehensively search for controversial subtopics. We then retrospectively analyze subtopics on COVID-19 using the proposed framework, with data obtained via Twitter in Japan. The results show that the proposed framework can effectively detect controversial subtopics that reflect current reality. Controversial subtopics tend to be about the government, medical matters, economy, and education; moreover, the controversy score had a low correlation with the traditional indicators--scale and sentiment of the subtopics--which suggests that the controversy score is a potentially important indicator to be obtained. We also discussed the difference between subtopics that became highly controversial and ones that did not despite their large scale. △ Less

Submitted 17 June, 2021; originally announced June 2021.

Comments: 8 pages, 3 figures

arXiv:2105.10319 [pdf, other]

Characterizing the Anti-Vaxxers' Reply Behavior on Social Media

Authors: Kunihiro Miyazaki, Takayuki Uchiba, Kenji Tanaka, Kazutoshi Sasahara

Abstract: Although the online campaigns of anti-vaccine advocates, or anti-vaxxers, severely threaten efforts for herd immunity, their reply behavior--the form of directed messaging that can be sent beyond follow-follower relationships--remains poorly understood. Here, we examined the characteristics of anti-vaxxers' reply behavior on Twitter to attempt to comprehend their characteristics of spreading their… ▽ More Although the online campaigns of anti-vaccine advocates, or anti-vaxxers, severely threaten efforts for herd immunity, their reply behavior--the form of directed messaging that can be sent beyond follow-follower relationships--remains poorly understood. Here, we examined the characteristics of anti-vaxxers' reply behavior on Twitter to attempt to comprehend their characteristics of spreading their beliefs in terms of interaction frequency, content, and targets. Among the results, anti-vaxxers more frequently conducted reply behavior with other clusters, especially neutral accounts. Anti-vaxxers' replies were significantly more toxic than those from neutral accounts and pro-vaxxers, and their toxicity, in particular, was higher with regard to the rollout of vaccines. Anti-vaxxers' replies were more persuasive than the others in terms of the emotional aspect, rather than linguistical styles. The targets of anti-vaxxers' replies tend to be accounts with larger numbers of followers and posts, including accounts that relate to health care or represent scientists, policy-makers, or media figures or outlets. We discussed how their reply behaviors are effective in spreading their beliefs, as well as possible countermeasures to restrain them. These findings should prove useful for pro-vaxxers and platformers to promote trusted information while reducing the effect of vaccine disinformation. △ Less

Submitted 12 November, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

Comments: ABCSS (WI-IAT'21 Companion). 11 pages, 2 figures

arXiv:1908.05841 [pdf, other]

Recurrent U-net: Deep learning to predict daily summertime ozone in the United States

Authors: Tai-Long He, Dylan B. A. Jones, Binxuan Huang, Yuyang Liu, Kazuyuki Miyazaki, Zhe Jiang, E. Charlie White, Helen M. Worden, John R. Worden

Abstract: We use a hybrid deep learning model to predict June-July-August (JJA) daily maximum 8-h average (MDA8) surface ozone concentrations in the US. A set of meteorological fields from the ERA-Interim reanalysis as well as monthly mean NO$_x$ emissions from the Community Emissions Data System (CEDS) inventory are selected as predictors. Ozone measurements from the US Environmental Protection Agency (EPA… ▽ More We use a hybrid deep learning model to predict June-July-August (JJA) daily maximum 8-h average (MDA8) surface ozone concentrations in the US. A set of meteorological fields from the ERA-Interim reanalysis as well as monthly mean NO$_x$ emissions from the Community Emissions Data System (CEDS) inventory are selected as predictors. Ozone measurements from the US Environmental Protection Agency (EPA) Air Quality System (AQS) from 1980 to 2009 are used to train the model, whereas data from 2010 to 2014 are used to evaluate the performance of the model. The model captures well daily, seasonal and interannual variability in MDA8 ozone across the US. Feature maps show that the model captures teleconnections between MDA8 ozone and the meteorological fields, which are responsible for driving the ozone dynamics. We used the model to evaluate recent trends in NO$_x$ emissions in the US and found that the trend in the EPA emission inventory produced the largest negative bias in MDA8 ozone between 2010-2016. The top-down emission trends from the Tropospheric Chemistry Reanalysis (TCR-2), which is based on satellite observations, produced predictions in best agreement with observations. In urban regions, the trend in AQS NO$_2$ observations provided ozone predictions in agreement with observations, whereas in rural regions the satellite-derived trends produced the best agreement. In both rural and urban regions the EPA trend resulted in the largest negative bias in predicted ozone. Our results suggest that the EPA inventory is overestimating the reductions in NO$_x$ emissions and that the satellite-derived trend reflects the influence of reductions in NO$_x$ emissions as well as changes in background NO$_x$. Our results demonstrate the significantly greater predictive capability that the deep learning model provides over conventional atmospheric chemical transport models for air quality analyses. △ Less

Submitted 16 August, 2019; originally announced August 2019.

arXiv:1210.7283 [pdf, other]

Abstract Data Types in Event-B - An Application of Generic Instantiation

Authors: David Basin, Andreas Fürst, Thai Son Hoang, Kunihiko Miyazaki, Naoto Sato

Abstract: Integrating formal methods into industrial practice is a challenging task. Often, different kinds of expertise are required within the same development. On the one hand, there are domain engineers who have specific knowledge of the system under development. On the other hand, there are formal methods experts who have experience in rigorously specifying and reasoning about formal systems. Coordinat… ▽ More Integrating formal methods into industrial practice is a challenging task. Often, different kinds of expertise are required within the same development. On the one hand, there are domain engineers who have specific knowledge of the system under development. On the other hand, there are formal methods experts who have experience in rigorously specifying and reasoning about formal systems. Coordination between these groups is important for taking advantage of their expertise. In this paper, we describe our approach of using generic instantiation to facilitate this coordination. In particular, generic instantiation enables a separation of concerns between the different parties involved in develo** formal systems. △ Less

Submitted 26 October, 2012; originally announced October 2012.

Comments: In Proceedings of DS-Event-B 2012: Workshop on the experience of and advances in develo** dependable systems in Event-B, in conjunction with ICFEM 2012 - Kyoto, Japan, November 13, 2012

Showing 1–16 of 16 results for author: Miyazaki, K