Search | arXiv e-print repository

ChatEMG: Synthetic Data Generation to Control a Robotic Hand Orthosis for Stroke

Authors: **gxi Xu, Runsheng Wang, Siqi Shang, Ava Chen, Lauren Winterbottom, To-Liang Hsu, Wenxi Chen, Khondoker Ahmed, Pedro Leandro La Rotta, Xinyue Zhu, Dawn M. Nilsen, Joel Stein, Matei Ciocarlie

Abstract: Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train i… ▽ More Intent inferral on a hand orthosis for stroke patients is challenging due to the difficulty of data collection from impaired subjects. Additionally, EMG signals exhibit significant variations across different conditions, sessions, and subjects, making it hard for classifiers to generalize. Traditional approaches require a large labeled dataset from the new condition, session, or subject to train intent classifiers; however, this data collection process is burdensome and time-consuming. In this paper, we propose ChatEMG, an autoregressive generative model that can generate synthetic EMG signals conditioned on prompts (i.e., a given sequence of EMG signals). ChatEMG enables us to collect only a small dataset from the new condition, session, or subject and expand it with synthetic samples conditioned on prompts from this new context. ChatEMG leverages a vast repository of previous data via generative training while still remaining context-specific via prompting. Our experiments show that these synthetic samples are classifier-agnostic and can improve intent inferral accuracy for different types of classifiers. We demonstrate that our complete approach can be integrated into a single patient session, including the use of the classifier for functional orthosis-assisted tasks. To the best of our knowledge, this is the first time an intent classifier trained partially on synthetic data has been deployed for functional control of an orthosis by a stroke survivor. Videos and additional information can be found at https://jxu.ai/chatemg. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2403.17784 [pdf, other]

doi 10.1145/3613905.3650738

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang

Abstract: Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific… ▽ More Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

arXiv:2403.15363 [pdf, other]

Cascading Blackout Severity Prediction with Statistically-Augmented Graph Neural Networks

Authors: Joe Gorka, Tim Hsu, Wenting Li, Yury Maximov, Line Roald

Abstract: Higher variability in grid conditions, resulting from growing renewable penetration and increased incidence of extreme weather events, has increased the difficulty of screening for scenarios that may lead to catastrophic cascading failures. Traditional power-flow-based tools for assessing cascading blackout risk are too slow to properly explore the space of possible failures and load/generation pa… ▽ More Higher variability in grid conditions, resulting from growing renewable penetration and increased incidence of extreme weather events, has increased the difficulty of screening for scenarios that may lead to catastrophic cascading failures. Traditional power-flow-based tools for assessing cascading blackout risk are too slow to properly explore the space of possible failures and load/generation patterns. We add to the growing literature of faster graph-neural-network (GNN)-based techniques, develo** two novel techniques for the estimation of blackout magnitude from initial grid conditions. First we propose several methods for employing an initial classification step to filter out safe "non blackout" scenarios prior to magnitude estimation. Second, using insights from the statistical properties of cascading blackouts, we propose a method for facilitating non-local message passing in our GNN models. We validate these two approaches on a large simulated dataset, and show the potential of both to increase blackout size estimation performance. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted to Power Systems Computation Conference (PSCC) 2024

arXiv:2403.03516 [pdf, other]

Unsupervised Multilingual Dense Retrieval via Generative Pseudo Labeling

Authors: Chao-Wei Huang, Chen-An Li, Tsu-Yuan Hsu, Chen-Yu Hsu, Yun-Nung Chen

Abstract: Dense retrieval methods have demonstrated promising performance in multilingual information retrieval, where queries and documents can be in different languages. However, dense retrievers typically require a substantial amount of paired data, which poses even greater challenges in multilingual scenarios. This paper introduces UMR, an Unsupervised Multilingual dense Retriever trained without any pa… ▽ More Dense retrieval methods have demonstrated promising performance in multilingual information retrieval, where queries and documents can be in different languages. However, dense retrievers typically require a substantial amount of paired data, which poses even greater challenges in multilingual scenarios. This paper introduces UMR, an Unsupervised Multilingual dense Retriever trained without any paired data. Our approach leverages the sequence likelihood estimation capabilities of multilingual language models to acquire pseudo labels for training dense retrievers. We propose a two-stage framework which iteratively improves the performance of multilingual dense retrievers. Experimental results on two benchmark datasets show that UMR outperforms supervised baselines, showcasing the potential of training multilingual retrievers without paired data, thereby enhancing their practicality. Our source code, data, and models are publicly available at https://github.com/MiuLab/UMR △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted to Findings of EACL 2024

arXiv:2312.05472 [pdf, other]

Spectroscopy-Guided Discovery of Three-Dimensional Structures of Disordered Materials with Diffusion Models

Authors: Hyuna Kwon, Tim Hsu, Wenyu Sun, Wonseok Jeong, Fikret Aydin, James Chapman, Xiao Chen, Matthew R. Carbone, Deyu Lu, Fei Zhou, Tuan Anh Pham

Abstract: The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of… ▽ More The ability to rapidly develop materials with desired properties has a transformative impact on a broad range of emerging technologies. In this work, we introduce a new framework based on the diffusion model, a recent generative machine learning method to predict 3D structures of disordered materials from a target property. For demonstration, we apply the model to identify the atomic structures of amorphous carbons ($a$-C) as a representative material system from the target X-ray absorption near edge structure (XANES) spectra--a common experimental technique to probe atomic structures of materials. We show that conditional generation guided by XANES spectra reproduces key features of the target structures. Furthermore, we show that our model can steer the generative process to tailor atomic arrangements for a specific XANES spectrum. Finally, our generative model exhibits a remarkable scale-agnostic property, thereby enabling generation of realistic, large-scale structures through learning from a small-scale dataset (i.e., with small unit cells). Our work represents a significant stride in bridging the gap between materials characterization and atomic structure determination; in addition, it can be leveraged for materials discovery in exploring various material properties as targeted. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2310.15405 [pdf, other]

GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang

Abstract: There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method fo… ▽ More There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: To Appear in EMNLP 2023 Findings

arXiv:2310.01678 [pdf, other]

Score dynamics: scaling molecular dynamics with picoseconds timestep via conditional diffusion model

Authors: Tim Hsu, Babak Sadigh, Vasily Bulatov, Fei Zhou

Abstract: We propose score dynamics (SD), a general framework for learning accelerated evolution operators with large timesteps from molecular-dynamics simulations. SD is centered around scores, or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to genera… ▽ More We propose score dynamics (SD), a general framework for learning accelerated evolution operators with large timesteps from molecular-dynamics simulations. SD is centered around scores, or derivatives of the transition log-probability with respect to the dynamical degrees of freedom. The latter play the same role as force fields in MD but are used in denoising diffusion probability models to generate discrete transitions of the dynamical variables in an SD timestep, which can be orders of magnitude larger than a typical MD timestep. In this work, we construct graph neural network based score dynamics models of realistic molecular systems that are evolved with 10~ps timesteps. We demonstrate the efficacy of score dynamics with case studies of alanine dipeptide and short alkanes in aqueous solution. Both equilibrium predictions derived from the stationary distributions of the conditional probability and kinetic predictions for the transition rates and transition paths are in good agreement with MD. Our current SD implementation is about two orders of magnitude faster than the MD counterpart for the systems studied in this work. Open challenges and possible future remedies to improve score dynamics are also discussed. △ Less

Submitted 6 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.14324 [pdf, other]

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Authors: Chun-Yi Kuan, Chen An Li, Tsu-Yuan Hsu, Tse-Yang Lin, Ho-Lam Chung, Kai-Wei Chang, Shuo-yiin Chang, Hung-yi Lee

Abstract: This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language mo… ▽ More This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to determine the attributes of the converted speech, our model adds versatility and specificity to voice conversion. The proposed VC model is a neural codec language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech. It utilizes text instructions as style prompts to modify the prosody and emotional information of the given speech. In contrast to previous approaches, which often rely on employing separate encoders like prosody and content encoders to handle different aspects of the source speech, our model handles various information of speech in an end-to-end manner. Experiments have demonstrated the impressive capabilities of our model in comprehending instructions and delivering reasonable results. △ Less

Submitted 16 January, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted to ASRU 2023

arXiv:2309.06748 [pdf, other]

CONVERSER: Few-Shot Conversational Dense Retrieval with Synthetic Data Generation

Authors: Chao-Wei Huang, Chen-Yu Hsu, Tsu-Yuan Hsu, Chen-An Li, Yun-Nung Chen

Abstract: Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In… ▽ More Conversational search provides a natural interface for information retrieval (IR). Recent approaches have demonstrated promising results in applying dense retrieval to conversational IR. However, training dense retrievers requires large amounts of in-domain paired data. This hinders the development of conversational dense retrievers, as abundant in-domain conversations are expensive to collect. In this paper, we propose CONVERSER, a framework for training conversational dense retrievers with at most 6 examples of in-domain dialogues. Specifically, we utilize the in-context learning capability of large language models to generate conversational queries given a passage in the retrieval corpus. Experimental results on conversational retrieval benchmarks OR-QuAC and TREC CAsT 19 show that the proposed CONVERSER achieves comparable performance to fully-supervised models, demonstrating the effectiveness of our proposed framework in few-shot conversational dense retrieval. All source code and generated datasets are available at https://github.com/MiuLab/CONVERSER △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to SIGDIAL 2023

arXiv:2305.07455 [pdf, other]

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation

Authors: Yu-Kuan Fu, Liang-Hsuan Tseng, Jiatong Shi, Chen-An Li, Tsu-Yuan Hsu, Shinji Watanabe, Hung-yi Lee

Abstract: Most of the speech translation models heavily rely on parallel data, which is hard to collect especially for low-resource languages. To tackle this issue, we propose to build a cascaded speech translation system without leveraging any kind of paired data. We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS. The results show that our work is co… ▽ More Most of the speech translation models heavily rely on parallel data, which is hard to collect especially for low-resource languages. To tackle this issue, we propose to build a cascaded speech translation system without leveraging any kind of paired data. We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS. The results show that our work is comparable with some other early supervised methods in some language pairs. While cascaded systems always suffer from severe error propagation problems, we proposed denoising back-translation (DBT), a novel approach to building robust unsupervised neural machine translation (UNMT). DBT successfully increases the BLEU score by 0.7--0.9 in all three translation directions. Moreover, we simplified the pipeline of our cascaded system to reduce inference latency and conducted a comprehensive analysis of every part of our work. We also demonstrate our unsupervised speech translation results on the established website. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2302.12757 [pdf, other]

Ensemble knowledge distillation of self-supervised speech models

Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective. On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. The experimental results show that our method improves the performance of the distilled models on four downstream speech processing tasks, Phoneme Recognition, Speaker Identification, Emotion Recognition, and Automatic Speech Recognition in the hidden-set track of the SUPERB benchmark. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.12324 [pdf, other]

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Authors: Chieh-Yang Huang, Ting-Yao Hsu, Ryan Rossi, Ani Nenkova, Sungchul Kim, Gromit Yeuk-Yin Chan, Eunyee Koh, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang

Abstract: Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be… ▽ More Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task. △ Less

Submitted 11 August, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted by INLG-2023

arXiv:2301.07208 [pdf, other]

Bounded Model Checking for Asynchronous Hyperproperties

Authors: Tzu-Han Hsu, Borzoo Bonakdarpour, Bernd Finkbeiner, César Sánchez

Abstract: Many types of attacks on confidentiality stem from the nondeterministic nature of the environment that computer programs operate in (e.g., schedulers and asynchronous communication channels). In this paper, we focus on verification of confidentiality in nondeterministic environments by reasoning about asynchronous hyperproperties. First, we generalize the temporal logic A-HLTL to allow nested traj… ▽ More Many types of attacks on confidentiality stem from the nondeterministic nature of the environment that computer programs operate in (e.g., schedulers and asynchronous communication channels). In this paper, we focus on verification of confidentiality in nondeterministic environments by reasoning about asynchronous hyperproperties. First, we generalize the temporal logic A-HLTL to allow nested trajectory quantification, where a trajectory determines how different execution traces may advance and stutter. We propose a bounded model checking algorithm for A-HLTL based on QBF-solving for a fragment of the generalized A-HLTL and evaluate it by various case studies on concurrent programs, scheduling attacks, compiler optimization, speculative execution, and cache timing attacks. We also rigorously analyze the complexity of model checking for different fragments of A-HLTL. △ Less

Submitted 25 January, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 34 pages

arXiv:2301.06209 [pdf, other]

Efficient Loop Conditions for Bounded Model Checking Hyperproperties

Authors: Tzu-Han Hsu, César Sánchez, Sarai Sheinvald, Borzoo Bonakdarpour

Abstract: Bounded model checking (BMC) is an effective technique for hunting bugs by incrementally exploring the state space of a system. To reason about infinite traces through a finite structure and to ultimately obtain completeness, BMC incorporates loop conditions that revisit previously observed states. This paper focuses on develo** loop conditions for BMC of HyperLTL- a temporal logic for hyperprop… ▽ More Bounded model checking (BMC) is an effective technique for hunting bugs by incrementally exploring the state space of a system. To reason about infinite traces through a finite structure and to ultimately obtain completeness, BMC incorporates loop conditions that revisit previously observed states. This paper focuses on develo** loop conditions for BMC of HyperLTL- a temporal logic for hyperproperties that allows expressing important policies for security and consistency in concurrent systems, etc. Loop conditions for HyperLTL are more complicated than for LTL, as different traces may loop inconsistently in unrelated moments. Existing BMC approaches for HyperLTL only considered linear unrollings without any loo** capability, which precludes both finding small infinite traces and obtaining a complete technique. We investigate loop conditions for HyperLTL BMC, where the HyperLTL formula can contain up to one quantifier alternation. We first present a general complete automata-based technique which is based on bounds of maximum unrollings. Then, we introduce alternative simulation-based algorithms that allow exploiting short loops effectively, generating SAT queries whose satisfiability guarantees the outcome of the original model checking problem. We also report empirical evaluation of the prototype implementation of our BMC techniques using Z3py. △ Less

Submitted 26 January, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 20 pages

arXiv:2212.02421 [pdf, other]

Score-based denoising for atomic structure identification

Authors: Tim Hsu, Babak Sadigh, Nicolas Bertin, Cheol Woo Park, James Chapman, Vasily Bulatov, Fei Zhou

Abstract: We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly revea… ▽ More We propose an effective method for removing thermal vibrations that complicate the task of analyzing complex dynamics in atomistic simulation of condensed matter. Our method iteratively subtracts thermal noises or perturbations in atomic positions using a denoising score function trained on synthetically noised but otherwise perfect crystal lattices. The resulting denoised structures clearly reveal underlying crystal order while retaining disorder associated with crystal defects. Purely geometric, agnostic to interatomic potentials, and trained without inputs from explicit simulations, our denoiser can be applied to simulation data generated from vastly different interatomic interactions. The denoiser is shown to improve existing classification methods such as common neighbor analysis and polyhedral template matching, reaching perfect classification accuracy on a recent benchmark dataset of thermally perturbed structures up to the melting point. Demonstrated here in a wide variety of atomistic simulation contexts, the denoiser is general, robust, and readily extendable to delineate order from disorder in structurally and chemically complex materials. △ Less

Submitted 3 May, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

arXiv:2211.16044 [pdf, other]

Model Extraction Attack against Self-supervised Speech Models

Authors: Tsu-Yuan Hsu, Chen-An Li, Tung-Yu Wu, Hung-yi Lee

Abstract: Self-supervised learning (SSL) speech models generate meaningful representations of given clips and achieve incredible performance across various downstream tasks. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. In this work, we study the MEA problem against SSL speech model with a small number of queries. We propose… ▽ More Self-supervised learning (SSL) speech models generate meaningful representations of given clips and achieve incredible performance across various downstream tasks. Model extraction attack (MEA) often refers to an adversary stealing the functionality of the victim model with only query access. In this work, we study the MEA problem against SSL speech model with a small number of queries. We propose a two-stage framework to extract the model. In the first stage, SSL is conducted on the large-scale unlabeled corpus to pre-train a small speech model. Secondly, we actively sample a small portion of clips from the unlabeled corpus and query the target model with these clips to acquire their representations as labels for the small model's second-stage training. Experiment results show that our sampling methods can effectively extract the target model without knowing any information about its model architecture. △ Less

Submitted 8 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.09892 [pdf, other]

Summarizing Community-based Question-Answer Pairs

Authors: Ting-Yao Hsu, Yoshi Suhara, Xiaolan Wang

Abstract: Community-based Question Answering (CQA), which allows users to acquire their desired information, has increasingly become an essential component of online services in various domains such as E-commerce, travel, and dining. However, an overwhelming number of CQA pairs makes it difficult for users without particular intent to find useful information spread over CQA pairs. To help users quickly dige… ▽ More Community-based Question Answering (CQA), which allows users to acquire their desired information, has increasingly become an essential component of online services in various domains such as E-commerce, travel, and dining. However, an overwhelming number of CQA pairs makes it difficult for users without particular intent to find useful information spread over CQA pairs. To help users quickly digest the key information, we propose the novel CQA summarization task that aims to create a concise summary from CQA pairs. To this end, we first design a multi-stage data annotation process and create a benchmark dataset, CoQASUM, based on the Amazon QA corpus. We then compare a collection of extractive and abstractive summarization methods and establish a strong baseline approach DedupLED for the CQA summarization task. Our experiment further confirms two key challenges, sentence-type transfer and deduplication removal, towards the CQA summarization task. Our data and code are publicly available. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: To appear in EMNLP 2022 main conference

arXiv:2210.07978 [pdf, other]

Improving generalizability of distilled self-supervised speech processing models under distorted settings

Authors: Kuan-Po Huang, Yu-Kuan Fu, Tsu-Yuan Hsu, Fabian Ritter Gutierrez, Fan-Lin Wang, Liang-Hsuan Tseng, Yu Zhang, Hung-yi Lee

Abstract: Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. Th… ▽ More Self-supervised learned (SSL) speech pre-trained models perform well across various speech processing tasks. Distilled versions of SSL models have been developed to match the needs of on-device speech applications. Though having similar performance as original SSL models, distilled counterparts suffer from performance degradation even more than their original versions in distorted environments. This paper proposes to apply Cross-Distortion Map** and Domain Adversarial Training to SSL models during knowledge distillation to alleviate the performance gap caused by the domain mismatch problem. Results show consistent performance improvements under both in- and out-of-domain distorted setups for different downstream tasks while kee** efficient model size. △ Less

Submitted 20 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted by IEEE SLT2022

arXiv:2209.12900 [pdf, other]

The Efficacy of Self-Supervised Speech Models for Audio Representations

Authors: Tung-Yu Wu, Chen-An Li, Tzu-Han Lin, Tsu-Yuan Hsu, Hung-Yi Lee

Abstract: Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on non-speech datasets is relatively less explored. In this work, we propose an ensemble framework, with a combination of ensemble techniques, to fuse SSL speech mo… ▽ More Self-supervised learning (SSL) speech models, which can serve as powerful upstream models to extract meaningful speech representations, have achieved unprecedented success in speech representation learning. However, their effectiveness on non-speech datasets is relatively less explored. In this work, we propose an ensemble framework, with a combination of ensemble techniques, to fuse SSL speech models' embeddings. Extensive experiments on speech and non-speech audio datasets are conducted to investigate the representation abilities of our ensemble method and its single constituent model. Ablation studies are carried out to evaluate the performances of different ensemble techniques, such as feature averaging and concatenation. All experiments are conducted during NeurIPS 2021 HEAR Challenge as a standard evaluation pipeline provided by competition officials. Results demonstrate SSL speech models' strong abilities on various non-speech tasks, while we also note that they fail to deal with fine-grained music tasks, such as pitch classification and note onset detection. In addition, feature ensemble is shown to have great potential on producing more holistic representations, as our proposed framework generally surpasses state-of-the-art SSL speech/audio models and has superior performance on various datasets compared with other teams in HEAR Challenge. Our code is available at https://github.com/tony10101105/HEAR-2021-NeurIPS-Challenge -- NTU-GURA. △ Less

Submitted 31 January, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

arXiv:2202.13033 [pdf]

A New BAT and PageRank algorithm for Propagation Probability in Social Networks

Authors: WC Yeh, CL Huang, TY Hsu, Z Liu, SY Tan

Abstract: Social networks have increasingly become important and popular in modern times. Moreover, the influence of social networks plays a vital role in various organizations including government organizations, academic research or corporate organizations. Therefore, how to strategize the optimal propagation strategy in social networks has also become more important. By increasing the precision of evaluat… ▽ More Social networks have increasingly become important and popular in modern times. Moreover, the influence of social networks plays a vital role in various organizations including government organizations, academic research or corporate organizations. Therefore, how to strategize the optimal propagation strategy in social networks has also become more important. By increasing the precision of evaluating the propagation probability of social network, it can indirectly influence the investment of cost, manpower and time for information propagation to achieve the best return. This study proposes a new algorithm, which includes a scale-free network, Barabasi-Albert model, Binary-Addition-Tree (BAT) algorithm, PageRank algorithm, personalized PageRank algorithm and a new BAT algorithm, to calculate the propagation probability in social networks. The results obtained after implementing the simulation experiment of social network models show the studied model and the proposed algorithm provide an effective method to increase the efficiency of information propagation in social networks. In this way, the maximum propagation efficiency is achieved with the minimum investment. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2201.12666 [pdf, other]

Challenges and approaches to privacy preserving post-click conversion prediction

Authors: Conor O'Brien, Arvind Thiagarajan, Sourav Das, Rafael Barreto, Chetan Verma, Tim Hsu, James Neufield, Jonathan J Hunt

Abstract: Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are of… ▽ More Online advertising has typically been more personalized than offline advertising, through the use of machine learning models and real-time auctions for ad targeting. One specific task, predicting the likelihood of conversion (i.e.\ the probability a user will purchase the advertised product), is crucial to the advertising ecosystem for both targeting and pricing ads. Currently, these models are often trained by observing individual user behavior, but, increasingly, regulatory and technical constraints are requiring privacy-preserving approaches. For example, major platforms are moving to restrict tracking individual user events across multiple applications, and governments around the world have shown steadily more interest in regulating the use of personal data. Instead of receiving data about individual user behavior, advertisers may receive privacy-preserving feedback, such as the number of installs of an advertised app that resulted from a group of users. In this paper we outline the recent privacy-related changes in the online advertising ecosystem from a machine learning perspective. We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone, and significantly reduces model degradation when no individual labels are available. Finally, we discuss future directions for research in this evolving area. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2110.11624 [pdf, other]

SciCap: Generating Captions for Scientific Figures

Authors: Ting-Yao Hsu, C. Lee Giles, Ting-Hao 'Kenneth' Huang

Abstract: Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientif… ▽ More Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientific figures. To this end, we introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing - including figure-type classification, sub-figure identification, text normalization, and caption text selection - SCICAP contained more than two million figures extracted from over 290,000 papers. We then established baseline models that caption graph plots, the dominant (19.2%) figure type. The experimental results showed both opportunities and steep challenges of generating captions for scientific figures. △ Less

Submitted 25 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: To Appear in EMNLP 2021 Findings. The dataset is available at: https://github.com/tingyaohsu/SciCap

arXiv:2110.05664 [pdf]

Accurate and Generalizable Quantitative Scoring of Liver Steatosis from Ultrasound Images via Scalable Deep Learning

Authors: Bowen Li, Dar-In Tai, Ke Yan, Yi-Cheng Chen, Shiu-Feng Huang, Tse-Hwa Hsu, Wan-Ting Yu, **g Xiao, Le Lu, Adam P. Harrison

Abstract: Background & Aims: Hepatic steatosis is a major cause of chronic liver disease. 2D ultrasound is the most widely used non-invasive tool for screening and monitoring, but associated diagnoses are highly subjective. We developed a scalable deep learning (DL) algorithm for quantitative scoring of liver steatosis from 2D ultrasound images. Approach & Results: Using retrospectively collected multi-vi… ▽ More Background & Aims: Hepatic steatosis is a major cause of chronic liver disease. 2D ultrasound is the most widely used non-invasive tool for screening and monitoring, but associated diagnoses are highly subjective. We developed a scalable deep learning (DL) algorithm for quantitative scoring of liver steatosis from 2D ultrasound images. Approach & Results: Using retrospectively collected multi-view ultrasound data from 3,310 patients, 19,513 studies, and 228,075 images, we trained a DL algorithm to diagnose steatosis stages (healthy, mild, moderate, or severe) from ultrasound diagnoses. Performance was validated on two multi-scanner unblinded and blinded (initially to DL developer) histology-proven cohorts (147 and 112 patients) with histopathology fatty cell percentage diagnoses, and a subset with FibroScan diagnoses. We also quantified reliability across scanners and viewpoints. Results were evaluated using Bland-Altman and receiver operating characteristic (ROC) analysis. The DL algorithm demonstrates repeatable measurements with a moderate number of images (3 for each viewpoint) and high agreement across 3 premium ultrasound scanners. High diagnostic performance was observed across all viewpoints: area under the curves of the ROC to classify >=mild, >=moderate, =severe steatosis grades were 0.85, 0.90, and 0.93, respectively. The DL algorithm outperformed or performed at least comparably to FibroScan with statistically significant improvements for all levels on the unblinded histology-proven cohort, and for =severe steatosis on the blinded histology-proven cohort. Conclusions: The DL algorithm provides a reliable quantitative steatosis assessment across view and scanners on two multi-scanner cohorts. Diagnostic performance was high with comparable or better performance than FibroScan. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: Journal paper submission, 45 pages (main body: 28 pages, supplementary material: 17 pages)

arXiv:2109.12989 [pdf, other]

HyperQB: A QBF-Based Bounded Model Checker for Hyperproperties

Authors: Tzu-Han Hsu, Borzoo Bonakdarpour, César Sánchez

Abstract: We present HyperQB, a push-button QBF-based bounded model checker for hyperproperties. HyperQB takes as input a NuSMV model and a formula expressed in the temporal logic HyperLTL. Our QBF-based technique allows HyperQB to seamlessly deal with quantifier alternations. Based on the selection of either bug hunting or synthesis, the instances of counterexamples (for negated formula) or witnesses (for… ▽ More We present HyperQB, a push-button QBF-based bounded model checker for hyperproperties. HyperQB takes as input a NuSMV model and a formula expressed in the temporal logic HyperLTL. Our QBF-based technique allows HyperQB to seamlessly deal with quantifier alternations. Based on the selection of either bug hunting or synthesis, the instances of counterexamples (for negated formula) or witnesses (for synthesis of positive formulas) are returned. We report on successful and effective verification for a rich set of experiments on a variety of case studies, including information-flow security, concurrent data structures, path planning for robots, co-termination, deniability, intransitivity of non-interference, and secrecy-preserving refinement. We also rigorously compare and contrast HyperQB with existing tools for model checking hyperporperties. △ Less

Submitted 24 January, 2024; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2109.11576 [pdf, other]

Efficient, Interpretable Graph Neural Network Representation for Angle-dependent Properties and its Application to Optical Spectroscopy

Authors: Tim Hsu, Tuan Anh Pham, Nathan Keilbart, Stephen Weitzner, James Chapman, Penghao Xiao, S. Roger Qiu, Xiao Chen, Brandon C. Wood

Abstract: Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include d… ▽ More Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include dihedral angles (ALIGNN-d). This simple extension leads to a memory-efficient graph representation that captures the complete geometry of atomic structures. ALIGNN-d is applied to predict the infrared optical response of dynamically disordered Cu(II) aqua complexes, leveraging the intrinsic interpretability to elucidate the relative contributions of individual structural components. Bond and dihedral angles are found to be critical contributors to the fine structure of the absorption response, with distortions representing transitions between more common geometries exhibiting the strongest absorption intensity. Future directions for further development of ALIGNN-d are discussed. △ Less

Submitted 15 February, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2103.08290 [pdf, other]

DeepOPG: Improving Orthopantomogram Finding Summarization with Weak Supervision

Authors: Tzu-Ming Harry Hsu, Yin-Chih Chelsea Wang

Abstract: Clinical finding summaries from an orthopantomogram, or a dental panoramic radiograph, have significant potential to improve patient communication and speed up clinical judgments. While orthopantomogram is a first-line tool for dental examinations, no existing work has explored the summarization of findings from it. A finding summary has to find teeth in the imaging study and label the teeth with… ▽ More Clinical finding summaries from an orthopantomogram, or a dental panoramic radiograph, have significant potential to improve patient communication and speed up clinical judgments. While orthopantomogram is a first-line tool for dental examinations, no existing work has explored the summarization of findings from it. A finding summary has to find teeth in the imaging study and label the teeth with several types of past treatments. To tackle the problem, we developDeepOPG that breaks the summarization process into functional segmentation and tooth localization, the latter of which is further refined by a novel dental coherence module. We also leverage weak supervision labels to improve detection results in a reinforcement learning scenario. Experiments show high efficacy of DeepOPG on finding summarization, achieving an overall AUC of 88.2% in detecting six types of findings. The proposed dental coherence and weak supervision both are shown to improve DeepOPG by adding 5.9% and 0.4% to AP@IoU=0.5. △ Less

Submitted 6 July, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2010.10938 [pdf, other]

What makes multilingual BERT multilingual?

Authors: Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-yi Lee

Abstract: Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize… ▽ More Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize and context window size are crucial factors to the transferability. △ Less

Submitted 20 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2004.09205

arXiv:2010.10041 [pdf, other]

Looking for Clues of Language in Multilingual BERT to Improve Cross-lingual Generalization

Authors: Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Chung-Yi Li, Hung-yi Lee

Abstract: Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control the output languages of multilingual BERT by manipulating the token embeddings, thus achieving unsupervised token translation. We… ▽ More Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information. We find that the representation of a language can be obtained by simply averaging the embeddings of the tokens of the language. Given this language representation, we control the output languages of multilingual BERT by manipulating the token embeddings, thus achieving unsupervised token translation. We further propose a computationally cheap but effective approach to improve the cross-lingual ability of m-BERT based on this observation. △ Less

Submitted 1 November, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: preprint

arXiv:2009.08907 [pdf, other]

Bounded Model Checking for Hyperproperties

Authors: Tzu-Han Hsu, Cesar Sanchez, Borzoo Bonakdarpour

Abstract: Hyperproperties are properties of systems that relate multiple computation traces, including security and concurrency properties. This paper introduces a bounded model checking (BMC) algorithm for hyperproperties expressed in HyperLTL, which - to the best of our knowledge - is the first such algorithm. Just as the classic BMC technique for LTL primarily aims at finding bugs, our approach also targ… ▽ More Hyperproperties are properties of systems that relate multiple computation traces, including security and concurrency properties. This paper introduces a bounded model checking (BMC) algorithm for hyperproperties expressed in HyperLTL, which - to the best of our knowledge - is the first such algorithm. Just as the classic BMC technique for LTL primarily aims at finding bugs, our approach also targets identifying counterexamples. BMC for LTL is reduced to SAT solving, because LTL describes a property via inspecting individual traces. HyperLTL allows explicit and simultaneous quantification over traces and describes properties that involves multiple traces and, hence, our BMC approach naturally reduces to QBF solving. We report on successful and efficient model checking, implemented in a tool called HyperQube, of a rich set of experiments on a variety of case studies, including security, concurrent data structures, path planning for robots, and testing. △ Less

Submitted 15 October, 2020; v1 submitted 18 September, 2020; originally announced September 2020.

arXiv:2007.07196 [pdf, other]

Investigation of Sentiment Controllable Chatbot

Authors: Hung-yi Lee, Cheng-Hao Ho, Chien-Fu Lin, Chiung-Chih Chang, Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen

Abstract: Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2se… ▽ More Conventional seq2seq chatbot models attempt only to find sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. In this paper, we investigate four models to scale or adjust the sentiment of the chatbot response: a persona-based model, reinforcement learning, a plug and play model, and CycleGAN, all based on the seq2seq model. We also develop machine-evaluated metrics to estimate whether the responses are reasonable given the input. These metrics, together with human evaluation, are used to analyze the performance of the four models in terms of different aspects; reinforcement learning and CycleGAN are shown to be very attractive. △ Less

Submitted 11 July, 2020; originally announced July 2020.

Comments: arXiv admin note: text overlap with arXiv:1804.02504

arXiv:2006.15229 [pdf, other]

CheXpert++: Approximating the CheXpert labeler for Speed,Differentiability, and Probabilistic Output

Authors: Matthew B. A. McDermott, Tzu Ming Harry Hsu, Wei-Hung Weng, Marzyeh Ghassemi, Peter Szolovits

Abstract: It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert, a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is rel… ▽ More It is often infeasible or impossible to obtain ground truth labels for medical data. To circumvent this, one may build rule-based or other expert-knowledge driven labelers to ingest data and yield silver labels absent any ground-truth training data. One popular such labeler is CheXpert, a labeler that produces diagnostic labels for chest X-ray radiology reports. CheXpert is very useful, but is relatively computationally slow, especially when integrated with end-to-end neural pipelines, is non-differentiable so can't be used in any applications that require gradients to flow through the labeler, and does not yield probabilistic outputs, which limits our ability to improve the quality of the silver labeler through techniques such as active learning. In this work, we solve all three of these problems with $\texttt{CheXpert++}$, a BERT-based, high-fidelity approximation to CheXpert. $\texttt{CheXpert++}$ achieves 99.81\% parity with CheXpert, which means it can be reliably used as a drop-in replacement for CheXpert, all while being significantly faster, fully differentiable, and probabilistic in output. Error analysis of $\texttt{CheXpert++}$ also demonstrates that $\texttt{CheXpert++}$ has a tendency to actually correct errors in the CheXpert labels, with $\texttt{CheXpert++}$ labels being more often preferred by a clinician over CheXpert labels (when they disagree) on all but one disease task. To further demonstrate the utility of these advantages in this model, we conduct a proof-of-concept active learning study, demonstrating we can improve accuracy on an expert labeled random subset of report sentences by approximately 8\% over raw, unaltered CheXpert by using one-iteration of active-learning inspired re-training. These findings suggest that simple techniques in co-learning and active learning can yield high-quality labelers under minimal, and controllable human labeling demands. △ Less

Submitted 26 June, 2020; originally announced June 2020.

Comments: To appear at MLHC 2020

arXiv:2006.13886 [pdf, other]

Microstructure Generation via Generative Adversarial Network for Heterogeneous, Topologically Complex 3D Materials

Authors: Tim Hsu, William K. Epting, Hokon Kim, Harry W. Abernathy, Gregory A. Hackett, Anthony D. Rollett, Paul A. Salvador, Elizabeth A. Holm

Abstract: Using a large-scale, experimentally captured 3D microstructure dataset, we implement the generative adversarial network (GAN) framework to learn and generate 3D microstructures of solid oxide fuel cell electrodes. The generated microstructures are visually, statistically, and topologically realistic, with distributions of microstructural parameters, including volume fraction, particle size, surfac… ▽ More Using a large-scale, experimentally captured 3D microstructure dataset, we implement the generative adversarial network (GAN) framework to learn and generate 3D microstructures of solid oxide fuel cell electrodes. The generated microstructures are visually, statistically, and topologically realistic, with distributions of microstructural parameters, including volume fraction, particle size, surface area, tortuosity, and triple phase boundary density, being highly similar to those of the original microstructure. These results are compared and contrasted with those from an established, grain-based generation algorithm (DREAM.3D). Importantly, simulations of electrochemical performance, using a locally resolved finite element model, demonstrate that the GAN generated microstructures closely match the performance distribution of the original, while DREAM.3D leads to significant differences. The ability of the generative machine learning model to recreate microstructures with high fidelity suggests that the essence of complex microstructures may be captured and represented in a compact and manipulatable form. △ Less

Submitted 22 June, 2020; originally announced June 2020.

Comments: submitted to JOM

arXiv:2004.09205 [pdf, other]

A Study of Cross-Lingual Ability and Language-specific Information in Multilingual BERT

Authors: Chi-Liang Liu, Tsung-Yuan Hsu, Yung-Sung Chuang, Hung-Yi Lee

Abstract: Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize… ▽ More Recently, multilingual BERT works remarkably well on cross-lingual transfer tasks, superior to static non-contextualized word embeddings. In this work, we provide an in-depth experimental study to supplement the existing literature of cross-lingual ability. We compare the cross-lingual ability of non-contextualized and contextualized representation model with the same data. We found that datasize and context window size are crucial factors to the transferability. We also observe the language-specific information in multilingual BERT. By manipulating the latent representations, we can control the output languages of multilingual BERT, and achieve unsupervised token translation. We further show that based on the observation, there is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2003.08082 [pdf, other]

Federated Visual Classification with Real-World Data Distribution

Authors: Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown

Abstract: Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data ar… ▽ More Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. The datasets are made available online. △ Less

Submitted 17 July, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

arXiv:2001.07672 [pdf, other]

Streaming Complexity of Spanning Tree Computation

Authors: Yi-Jun Chang, Martin Farach-Colton, Tsan-Sheng Hsu, Meng-Tsung Tsai

Abstract: The semi-streaming model is a variant of the streaming model frequently used for the computation of graph problems. It allows the edges of an $n$-node input graph to be read sequentially in $p$ passes using $\tilde{O}(n)$ space. In this model, some graph problems, such as spanning trees and $k$-connectivity, can be exactly solved in a single pass; while other graph problems, such as triangle detec… ▽ More The semi-streaming model is a variant of the streaming model frequently used for the computation of graph problems. It allows the edges of an $n$-node input graph to be read sequentially in $p$ passes using $\tilde{O}(n)$ space. In this model, some graph problems, such as spanning trees and $k$-connectivity, can be exactly solved in a single pass; while other graph problems, such as triangle detection and unweighted all-pairs shortest paths, are known to require $\tildeΩ(n)$ passes to compute. For many fundamental graph problems, the tractability in these models is open. In this paper, we study the tractability of computing some standard spanning trees. Our results are: (1) Maximum-Leaf Spanning Trees. This problem is known to be APX-complete with inapproximability constant $ρ\in[245/244,2)$. By constructing an $\varepsilon$-MLST sparsifier, we show that for every constant $\varepsilon > 0$, MLST can be approximated in a single pass to within a factor of $1+\varepsilon$ w.h.p. (albeit in super-polynomial time for $\varepsilon \le ρ-1$ assuming $\mathrm{P} \ne \mathrm{NP}$). (2) BFS Trees. It is known that BFS trees require $ω(1)$ passes to compute, but the naïve approach needs $O(n)$ passes. We devise a new randomized algorithm that reduces the pass complexity to $O(\sqrt{n})$, and it offers a smooth tradeoff between pass complexity and space usage. (3) DFS Trees. The current best algorithm by Khan and Mehta {[}STACS 2019{]} takes $\tilde{O}(h)$ passes, where $h$ is the height of computed DFS trees. Our contribution is twofold. First, we provide a simple alternative proof of this result, via a new connection to sparse certificates for $k$-node-connectivity. Second, we present a randomized algorithm that reduces the pass complexity to $O(\sqrt{n})$, and it also offers a smooth tradeoff between pass complexity and space usage. △ Less

Submitted 21 January, 2020; originally announced January 2020.

Comments: This is the full version of a conference paper to appear in the Proceedings of 37th International Symposium on Theoretical Aspects of Computer Science (STACS)

ACM Class: F.2

arXiv:1909.09587 [pdf, ps, other]

Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model

Authors: Tsung-yuan Hsu, Chi-liang Liu, Hung-yi Lee

Abstract: Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning. In this paper, we systematically explore zero-shot cross-lingual transfer learning on reading comprehension tasks with a language representation model pre-trained on multi-lingual corpus. The experimental results show that with pre-trained language representation… ▽ More Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning. In this paper, we systematically explore zero-shot cross-lingual transfer learning on reading comprehension tasks with a language representation model pre-trained on multi-lingual corpus. The experimental results show that with pre-trained language representation zero-shot learning is feasible, and translating the source data into the target language is not necessary and even degrades the performance. We further explore what does the model learn in zero-shot setting. △ Less

Submitted 15 September, 2019; originally announced September 2019.

arXiv:1909.06335 [pdf, other]

Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification

Authors: Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown

Abstract: Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize d… ▽ More Federated Learning enables visual models to be trained in a privacy-preserving way using real-world data from mobile devices. Given their distributed nature, the statistics of the data across these devices is likely to differ significantly. In this work, we look at the effect such non-identical data distributions has on visual classification via Federated Learning. We propose a way to synthesize datasets with a continuous range of identicalness and provide performance measures for the Federated Averaging algorithm. We show that performance degrades as distributions differ more, and propose a mitigation strategy via server momentum. Experiments on CIFAR-10 demonstrate improved classification performance over a range of non-identicalness, with classification accuracy improved from 30.1% to 76.9% in the most skewed settings. △ Less

Submitted 13 September, 2019; originally announced September 2019.

arXiv:1907.12888 [pdf, other]

CoachAI: A Project for Microscopic Badminton Match Data Collection and Tactical Analysis

Authors: Tzu-Han Hsu, Ching-Hsuan Chen, Nyan ** Ju, Tsì-Uí İk, Wen-Chih Peng, Chih-Chuan Wang, Yu-Shuen Wang, Yuan-Hsiang Lin, Yu-Chee Tseng, Jiun-Long Huang, Yu-Tai Ching

Abstract: Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and… ▽ More Computer vision based object tracking has been used to annotate and augment sports video. For sports learning and training, video replay is often used in post-match review and training review for tactical analysis and movement analysis. For automatically and systematically competition data collection and tactical analysis, a project called CoachAI has been supported by the Ministry of Science and Technology, Taiwan. The proposed project also includes research of data visualization, connected training auxiliary devices, and data warehouse. Deep learning techniques will be used to develop video-based real-time microscopic competition data collection based on broadcast competition video. Machine learning techniques will be used to develop a tactical analysis. To reveal data in more understandable forms and to help in pre-match training, AR/VR techniques will be used to visualize data, tactics, and so on. In addition, training auxiliary devices including smart badminton rackets and connected serving machines will be developed based on the IoT technology to further utilize competition data and tactical data and boost training efficiency. Especially, the connected serving machines will be developed to perform specified tactics and to interact with players in their training. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1906.01764 [pdf, other]

Visual Story Post-Editing

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang

Abstract: We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We… ▽ More We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We establish baselines for the task, showing how a relatively small set of human edits can be leveraged to boost the performance of large visual storytelling models. We also discuss the weak correlation between automatic evaluation scores and human ratings, motivating the need for new automatic metrics. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: Accepted by ACL 2019

arXiv:1904.02633 [pdf, other]

Clinically Accurate Chest X-Ray Report Generation

Authors: Guanxiong Liu, Tzu-Ming Harry Hsu, Matthew McDermott, Willie Boag, Wei-Hung Weng, Peter Szolovits, Marzyeh Ghassemi

Abstract: The automatic generation of radiology reports given medical radiographs has significant potential to operationally and improve clinical patient care. A number of prior works have focused on this problem, employing advanced methods from computer vision and natural language generation to produce readable reports. However, these works often fail to account for the particular nuances of the radiology… ▽ More The automatic generation of radiology reports given medical radiographs has significant potential to operationally and improve clinical patient care. A number of prior works have focused on this problem, employing advanced methods from computer vision and natural language generation to produce readable reports. However, these works often fail to account for the particular nuances of the radiology domain, and, in particular, the critical importance of clinical accuracy in the resulting generated reports. In this work, we present a domain-aware automatic chest X-ray radiology report generation system which first predicts what topics will be discussed in the report, then conditionally generates sentences corresponding to these topics. The resulting system is fine-tuned using reinforcement learning, considering both readability and clinical accuracy, as assessed by the proposed Clinically Coherent Reward. We verify this system on two datasets, Open-I and MIMIC-CXR, and demonstrate that our model offers marked improvements on both language generation metrics and CheXpert assessed accuracy over a variety of competitive baselines. △ Less

Submitted 29 July, 2019; v1 submitted 4 April, 2019; originally announced April 2019.

arXiv:1902.08327 [pdf, other]

On How Users Edit Computer-Generated Visual Stories

Authors: Ting-Yao Hsu, Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang

Abstract: A significant body of research in Artificial Intelligence (AI) has focused on generating stories automatically, either based on prior story plots or input images. However, literature has little to say about how users would receive and use these stories. Given the quality of stories generated by modern AI algorithms, users will nearly inevitably have to edit these stories before putting them to rea… ▽ More A significant body of research in Artificial Intelligence (AI) has focused on generating stories automatically, either based on prior story plots or input images. However, literature has little to say about how users would receive and use these stories. Given the quality of stories generated by modern AI algorithms, users will nearly inevitably have to edit these stories before putting them to real use. In this paper, we present the first analysis of how human users edit machine-generated stories. We obtained 962 short stories generated by one of the state-of-the-art visual storytelling models. For each story, we recruited five crowd workers from Amazon Mechanical Turk to edit it. Our analysis of these edits shows that, on average, users (i) slightly shortened machine-generated stories, (ii) increased lexical diversity in these stories, and (iii) often replaced nouns and their determiners/articles with pronouns. Our study provides a better understanding on how users receive and edit machine-generated stories,informing future researchers to create more usable and helpful story generation systems. △ Less

Submitted 8 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: To appear in CHI'19 Late-Breaking Work on Human Factors in Computing Systems (CHI LBW 2019), 2019

arXiv:1811.08615 [pdf, other]

Unsupervised Multimodal Representation Learning across Medical Images and Reports

Authors: Tzu-Ming Harry Hsu, Wei-Hung Weng, Willie Boag, Matthew McDermott, Peter Szolovits

Abstract: Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval m… ▽ More Joint embeddings between medical imaging modalities and associated radiology reports have the potential to offer significant benefits to the clinical community, ranging from cross-domain retrieval to conditional generation of reports to the broader goals of multimodal representation learning. In this work, we establish baseline joint embedding results measured via both local and global retrieval methods on the soon to be released MIMIC-CXR dataset consisting of both chest X-ray images and the associated radiology reports. We examine both supervised and unsupervised methods on this task and show that for document retrieval tasks with the learned representations, only a limited amount of supervision is needed to yield results comparable to those of fully-supervised methods. △ Less

Submitted 21 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/215

arXiv:1808.09351 [pdf, other]

3D-Aware Scene Manipulation via Inverse Graphics

Authors: Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

Abstract: We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by… ▽ More We aim to obtain an interpretable, expressive, and disentangled scene representation that contains comprehensive structural and textural information for each object. Previous scene representations learned by neural networks are often uninterpretable, limited to a single object, or lacking 3D knowledge. In this work, we propose 3D scene de-rendering networks (3D-SDN) to address the above issues by integrating disentangled representations for semantics, geometry, and appearance into a deep generative model. Our scene encoder performs inverse graphics, translating a scene into a structured object-wise representation. Our decoder has two components: a differentiable shape renderer and a neural texture generator. The disentanglement of semantics, geometry, and appearance supports 3D-aware scene manipulation, e.g., rotating and moving objects freely while kee** the consistent shape and texture, and changing the object appearance without affecting its shape. Experiments demonstrate that our editing scheme based on 3D-SDN is superior to its 2D counterpart. △ Less

Submitted 18 December, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

Comments: NeurIPS 2018. Code: https://github.com/ysymyth/3D-SDN Website: http://3dsdn.csail.mit.edu/

arXiv:1805.07027 [pdf, other]

Efficient Downlink Channel Reconstruction for FDD Multi-Antenna Systems

Authors: Yu Han, Tien-Hao Hsu, Chao-Kai Wen, Kai-Kit Wong, Shi **

Abstract: In this paper, we propose an efficient downlink channel reconstruction scheme for a frequency-division-duplex multi-antenna system by utilizing uplink channel state information combined with limited feedback. Based on the spatial reciprocity in a wireless channel, the downlink channel is reconstructed by using frequency-independent parameters. We first estimate the gains, delays, and angles during… ▽ More In this paper, we propose an efficient downlink channel reconstruction scheme for a frequency-division-duplex multi-antenna system by utilizing uplink channel state information combined with limited feedback. Based on the spatial reciprocity in a wireless channel, the downlink channel is reconstructed by using frequency-independent parameters. We first estimate the gains, delays, and angles during uplink sounding. The gains are then refined through downlink training and sent back to the base station (BS). With limited overhead, the refinement can substantially improve the accuracy of the downlink channel reconstruction. The BS can then reconstruct the downlink channel with the uplink-estimated delays and angles and the downlink-refined gains. We also introduce and extend the Newtonized orthogonal matching pursuit (NOMP) algorithm to detect the delays and gains in a multi-antenna multi-subcarrier condition. The results of our analysis show that the extended NOMP algorithm achieves high estimation accuracy. Simulations and over-the-air tests are performed to assess the performance of the efficient downlink channel reconstruction scheme. The results show that the reconstructed channel is close to the practical channel and that the accuracy is enhanced when the number of BS antennas increases, thereby highlighting that the promising application of the proposed scheme in large-scale antenna array systems. △ Less

Submitted 17 May, 2018; originally announced May 2018.

arXiv:1804.02504 [pdf, other]

Scalable Sentiment for Sequence-to-sequence Chatbot Response with Performance Analysis

Authors: Chih-Wei Lee, Yau-Shian Wang, Tsung-Yuan Hsu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee

Abstract: Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. Some research works trying to modify the sentiment of the output sequences were reported. In this paper, we propose five models to scale or adjust the sentiment of the chatbot response: persona-based model,… ▽ More Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. Some research works trying to modify the sentiment of the output sequences were reported. In this paper, we propose five models to scale or adjust the sentiment of the chatbot response: persona-based model, reinforcement learning, plug and play model, sentiment transformation network and cycleGAN, all based on the conventional seq2seq model. We also develop two evaluation metrics to estimate if the responses are reasonable given the input. These metrics together with other two popularly used metrics were used to analyze the performance of the five proposed models on different aspects, and reinforcement learning and cycleGAN were shown to be very attractive. The evaluation metrics were also found to be well correlated with human evaluation. △ Less

Submitted 6 April, 2018; originally announced April 2018.

arXiv:1505.04909 [pdf, ps, other]

doi 10.1109/TSP.2015.2449262

Optimized Random Deployment of Energy Harvesting Sensors for Field Reconstruction in Analog and Digital Forwarding Systems

Authors: Teng-Cheng Hsu, Y. -W. Peter Hong, Tsang-Yi Wang

Abstract: This work examines the large-scale deployment of energy harvesting sensors for the purpose of sensing and reconstruction of a spatially correlated Gaussian random field. The sensors are powered solely by energy harvested from the environment and are deployed randomly according to a spatially nonhomogeneous Poisson point process whose density depends on the energy arrival statistics at different lo… ▽ More This work examines the large-scale deployment of energy harvesting sensors for the purpose of sensing and reconstruction of a spatially correlated Gaussian random field. The sensors are powered solely by energy harvested from the environment and are deployed randomly according to a spatially nonhomogeneous Poisson point process whose density depends on the energy arrival statistics at different locations. Random deployment is suitable for applications that require deployment over a wide and/or hostile area. During an observation period, each sensor takes a local sample of the random field and reports the data to the closest data-gathering node if sufficient energy is available for transmission. The realization of the random field is then reconstructed at the fusion center based on the reported sensor measurements. For the purpose of field reconstruction, the sensors should, on the one hand, be more spread out over the field to gather more informative samples, but should, on the other hand, be more concentrated at locations with high energy arrival rates or large channel gains toward the closest data-gathering node. This tradeoff is exploited in the optimization of the random sensor deployment in both analog and digital forwarding systems. More specifically, given the statistics of the energy arrival at different locations and a constraint on the average number of sensors, the spatially-dependent sensor density and the energy-aware transmission policy at the sensors are determined for both cases by minimizing an upper bound on the average mean-square reconstruction error. The efficacy of the proposed schemes are demonstrated through numerical simulations. △ Less

Submitted 19 May, 2015; originally announced May 2015.

arXiv:cs/0102009 [pdf, ps, other]

Optimal Augmentation for Bipartite Componentwise Biconnectivity in Linear Time

Authors: Tsan-sheng Hsu, Ming-Yang Kao

Abstract: A graph is componentwise biconnected if every connected component either is an isolated vertex or is biconnected. We present a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness. This algorithm has immediate applications for protecting sensitive information in statistical tables. A graph is componentwise biconnected if every connected component either is an isolated vertex or is biconnected. We present a linear-time algorithm for the problem of adding the smallest number of edges to make a bipartite graph componentwise biconnected while preserving its bipartiteness. This algorithm has immediate applications for protecting sensitive information in statistical tables. △ Less

Submitted 9 February, 2001; originally announced February 2001.

Comments: A preliminary version appeared in T. Asano, Y. Igarashi, H. Nagamochi, S. Miyano, and S. Suri, editors, Lecture Notes in Computer Science 1178: Proceedings of the 7th Annual International Symposium on Algorithms and Computation, pages 213--222. Springer-Verlag, New York, NY, 1996

ACM Class: F.2.2; G.2.2

Showing 1–47 of 47 results for author: Hsu, T