Skip to main content

Showing 1–50 of 117 results for author: Koo, T

.
  1. Learning Retrieval Augmentation for Personalized Dialogue Generation

    Authors: Qiushi Huang, Shuai Fu, Xubo Liu, Wenwu Wang, Tom Ko, Yu Zhang, Lilian Tang

    Abstract: Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting in current personalized dialogue datasets, typically composed of merely four to five sentences, may not offer comprehensive descriptions of the perso… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to EMNLP-2023

  2. arXiv:2406.18187  [pdf, other

    cs.CL cs.AI cs.LG

    Selective Prompting Tuning for Personalized Conversations with LLMs

    Authors: Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang

    Abstract: In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 findings

  3. arXiv:2405.19312  [pdf, other

    stat.ME

    Design-based Causal Inference for Balanced Incomplete Block Designs

    Authors: Taehyeon Koo, Nicole E. Pashley

    Abstract: Researchers often turn to block randomization to increase the precision of their inference or due to practical considerations, such as in multi-site trials. However, if the number of treatments under consideration is large it might not be practical or even feasible to assign all treatments within each block. We develop novel inference results under the finite-population design-based framework for… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2405.16835  [pdf

    cond-mat.mtrl-sci physics.chem-ph

    Superionic surface Li-ion transport in carbonaceous materials

    Authors: Jianbin Zhou, Shen Wang, Chaoshan Wu, Ji Qi, Hongli Wan, Shen Lai, Shijie Feng, Tsz Wai Ko, Zhaohui Liang, Ke Zhou, Nimrod Harpak, Nick Solan, Mengchen Liu, Zeyu Hui, Paulina J. Ai, Kent Griffith, Chunsheng Wang, Shyue ** Ong, Yan Yao, ** Liu

    Abstract: Unlike Li-ion transport in the bulk of carbonaceous materials, little is known about Li-ion diffusion on their surface. In this study, we have discovered an ultra-fast Li-ion transport phenomenon on the surface of carbonaceous materials, particularly when they have limited Li insertion capacity along with a high surface area. This is exemplified by a carbon black, Ketjen Black (KB). An ionic condu… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 21 pages, 6 figures

  5. "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

    Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

    Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Journal ref: "We Need Structured Output": Towards User-centered Constraints on LLM Output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11-16, 2024, Honolulu, HI, USA

  6. arXiv:2402.12647  [pdf, other

    cs.CV cs.RO

    DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation

    Authors: Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad, Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki

    Abstract: This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonic… ▽ More

    Submitted 5 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 8 pages. 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  7. arXiv:2401.12487  [pdf

    astro-ph.SR astro-ph.GA astro-ph.HE

    Radio emission from SN 1181 hosting a white dwarf merger product

    Authors: Takatoshi Ko, Daichi Tsuna, Bunyo Hatsukade, Toshikazu Shigeyama

    Abstract: The remnant of the historical supernova 1181 is claimed to be associated with a white dwarf merger remnant J005311. The supernova remnant (SNR) shock, and a termination shock expected to be formed by the intense wind of J005311, are potential sites for radio emission via synchrotron emission from shock-accelerated electrons. In this paper, we estimate the radio emission from these two shocks, and… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: 8 pages, 4 figures, 1 Japanese movie (https://j005311.com/). Accepted for publication in PASJ

    Report number: RESCEU-1/24

  8. arXiv:2312.13585  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Translation with Large Language Models: An Industrial Practice

    Authors: Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

    Abstract: Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long au… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Technical report. 13 pages. Demo: https://speechtranslation.github.io/llm-st/

  9. arXiv:2312.11804  [pdf, other

    cs.RO

    Gravity-aware Grasp Generation with Implicit Grasp Mode Selection for Underactuated Hands

    Authors: Tianyi Ko, Takuya Ikeda, Thomas Stewart, Robert Lee, Koichi Nishiwaki

    Abstract: Learning-based grasp detectors typically assume a precision grasp, where each finger only has one contact point, and estimate the grasp probability. In this work, we propose a data generation and learning pipeline that can leverage power gras**, which has more contact points with an envelo** configuration and is robust against both positioning error and force disturbance. To train a grasp dete… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  10. arXiv:2311.00088  [pdf, other

    quant-ph math.OC

    Random coordinate descent: a simple alternative for optimizing parameterized quantum circuits

    Authors: Zhiyan Ding, Taehee Ko, Jiahao Yao, Lin Lin, Xiantao Li

    Abstract: Variational quantum algorithms rely on the optimization of parameterized quantum circuits in noisy settings. The commonly used back-propagation procedure in classical machine learning is not directly applicable in this setting due to the collapse of quantum states after measurements. Thus, gradient estimations constitute a significant overhead in a gradient-based optimization of such quantum circu… ▽ More

    Submitted 28 June, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

  11. arXiv:2309.00169  [pdf, other

    eess.AS cs.LG cs.SD

    RepCodec: A Speech Representation Codec for Speech Tokenization

    Authors: Zhichao Huang, Chutong Meng, Tom Ko

    Abstract: With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing overall performance. To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech token… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

  12. arXiv:2308.10785  [pdf, other

    astro-ph.HE astro-ph.SR

    Simulating Hydrogen-poor Interaction-Powered Supernovae with CHIPS

    Authors: Yuki Takei, Daichi Tsuna, Takatoshi Ko, Toshikazu Shigeyama

    Abstract: We present the updated open-source code Complete History of Interaction-Powered Supernovae (CHIPS) that can be applied to modeling supernovae (SNe) arising from an interaction with massive circumstellar medium (CSM) as well as the formation process of the CSM. Our update mainly concerns with extensions to hydrogen-poor SNe from stripped progenitors, targeting modeling of interaction-powered SNe Ib… ▽ More

    Submitted 18 November, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: 17 pages, 9 figures, accepted for publication in ApJ. The updates to the CHIPS code have been released as v2.0 (https://github.com/DTsuna/CHIPS)

    Report number: RESCEU-25/23

  13. arXiv:2307.13710  [pdf, other

    cond-mat.mtrl-sci

    Robust Training of Machine Learning Interatomic Potentials with Dimensionality Reduction and Stratified Sampling

    Authors: Ji Qi, Tsz Wai Ko, Brandon C. Wood, Tuan Anh Pham, Shyue ** Ong

    Abstract: Machine learning interatomic potentials (MLIPs) enable the accurate simulation of materials at larger sizes and time scales, and play increasingly important roles in the computational understanding and design of materials. However, MLIPs are only as accurate and robust as the data they are trained on. In this work, we present DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) samplin… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  14. arXiv:2307.07067  [pdf, other

    quant-ph math.NA

    Implementation of the Density-functional Theory on Quantum Computers with Linear Scaling with respect to the Number of Atoms

    Authors: Taehee Ko, Xiantao Li, Chunhao Wang

    Abstract: Density-functional theory (DFT) has revolutionized computer simulations in chemistry and material science. A faithful implementation of the theory requires self-consistent calculations. However, this effort involves repeatedly diagonalizing the Hamiltonian, for which a classical algorithm typically requires a computational complexity that scales cubically with respect to the number of electrons. T… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  15. arXiv:2306.11646  [pdf, other

    cs.CL eess.AS

    Recent Advances in Direct Speech-to-text Translation

    Authors: Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, **gbo Zhu

    Abstract: Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and applicati… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: An expanded version of the paper accepted by IJCAI2023 survey track

  16. arXiv:2306.10493  [pdf, other

    cs.SD cs.CL eess.AS

    MOSPC: MOS Prediction Based on Pairwise Comparison

    Authors: Kexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang

    Abstract: As a subjective metric to evaluate the quality of synthesized speech, Mean opinion score~(MOS) usually requires multiple annotators to score the same speech. Such an annotation approach requires a lot of manpower and is also time-consuming. MOS prediction model for automatic evaluation can significantly reduce labor cost. In previous works, it is difficult to accurately rank the quality of speech… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

  17. arXiv:2306.08273  [pdf, other

    physics.chem-ph

    Beyond potential energy surface benchmarking: a complete application of machine learning to chemical reactivity

    Authors: Xingyi Guan, Joseph Heindel, Taehee Ko, Chao Yang, Teresa Head-Gordon

    Abstract: We train an equivariant machine learning model to predict energies and forces for a real-world study of hydrogen combustion under conditions of finite temperature and pressure. This challenging case for reactive chemistry illustrates that ML learned potential energy surfaces (PESs) are always incomplete as they are overly reliant on chemical intuition of what data is important for training, i.e. s… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  18. arXiv:2306.02982  [pdf, other

    cs.CL eess.AS

    PolyVoice: Language Models for Speech to Speech Translation

    Authors: Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yu** Wang, Mingxuan Wang, Yuxuan Wang

    Abstract: We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt… ▽ More

    Submitted 13 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  19. arXiv:2305.17358  [pdf, other

    cs.CL

    CTC-based Non-autoregressive Speech Translation

    Authors: Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, **gbo Zhu

    Abstract: Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST). In particular, we develop a model consisting of two encoders th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Main Conference

  20. arXiv:2305.11411  [pdf, other

    cs.CL cs.SD eess.AS

    DUB: Discrete Unit Back-translation for Speech Translation

    Authors: Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

    Abstract: How can speech-to-text translation (ST) perform as well as machine translation (MT)? The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST. Recently, the approach of representing speech with unsupervised discrete units yields a new way to ease the modality problem. This motivates us to propose Discrete Unit Back-translation (DUB) to a… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  21. arXiv:2305.10692  [pdf, other

    physics.chem-ph

    Accurate Fourth-Generation Machine Learning Potentials by Electrostatic Embedding

    Authors: Tsz Wai Ko, Jonas A. Finkler, Stefan Goedecker, Jörg Behler

    Abstract: In recent years, significant progress has been made in the development of machine learning potentials (MLPs) for atomistic simulations with applications in many fields from chemistry to materials science. While most current MLPs are based on environment-dependent atomic energies, the limitations of this locality approximation can be overcome, e.g., in fourth-generation MLPs, which incorporate long… ▽ More

    Submitted 18 May, 2023; originally announced May 2023.

    Comments: 41 pages, 7 figures, accepted

    Journal ref: J. Chem. Theory Comput., 2023

  22. arXiv:2305.07198  [pdf, other

    eess.SY

    Model Predictive Control of Smart Districts Participating in Frequency Regulation Market: A Case Study of Using Heating Network Storage

    Authors: Hikaru Hoshino, T. John Koo, Yun-Chung Chu, Yoshihiko Susuki

    Abstract: Flexibility provided by Combined Heat and Power (CHP) units in district heating networks is an important means to cope with increasing penetration of intermittent renewable energy resources, and various methods have been proposed to exploit thermal storage tanks installed in these networks. This paper studies a novel problem motivated by an example of district heating and cooling networks in Japan… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  23. arXiv:2304.14669  [pdf, other

    astro-ph.SR astro-ph.HE

    A dynamical model for IRAS 00500+6713: the remnant of a type Iax supernova SN 1181 hosting a double degenerate merger product WD J005311

    Authors: Takatoshi Ko, Hiromasa Suzuki, Kazumi Kashiyama, Hiroyuki Uchida, Takaaki Tanaka, Daichi Tsuna, Kotaro Fujisawa, Aya Bamba, Toshikazu Shigeyama

    Abstract: IRAS 00500+6713 is a hypothesized remnant of a type Iax supernova SN 1181. Multi-wavelength observations have revealed its complicated morphology; a dusty infrared ring is sandwiched by the inner and outer X-ray nebulae. We analyze the archival X-ray data taken by XMM-Newton and Chandra to constrain the {angular radius}, mass, and metal abundance of the X-ray nebulae, and construct a theoretical m… ▽ More

    Submitted 26 May, 2024; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: 24 pages, 13 figures, 4 tables, accepted by ApJ

    Report number: RESCEU-10/23

  24. arXiv:2304.09296  [pdf, other

    physics.chem-ph physics.comp-ph

    Using Diffusion Maps to Analyze Reaction Dynamics for a Hydrogen Combustion Benchmark Dataset

    Authors: Taehee Ko, Joseph Heindel, Xingyi Guan, Teresa Head-Gordon, David Williams-Young, Chao Yang

    Abstract: We use local diffusion maps to assess the quality of two types of collective variables (CVs) for a recently published hydrogen combustion benchmark dataset~\cite{guan2022benchmark} that contains ab initio molecular dynamics trajectories and normal modes along minimum energy paths. This approach was recently advocated in~\cite{tlldiffmap20} for assessing CVs and analyzing reactions modeled by class… ▽ More

    Submitted 18 April, 2023; originally announced April 2023.

  25. arXiv:2303.17395  [pdf, other

    eess.AS cs.CL cs.MM cs.SD

    WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

    Authors: Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

    Abstract: The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years. However, researchers face challenges due to the costly and time-consuming collection process of existing audio-language datasets, which are limited in size. To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approx… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 12 pages

  26. Finding Heterophilic Neighbors via Confidence-based Subgraph Matching for Semi-supervised Node Classification

    Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

    Abstract: Graph Neural Networks (GNNs) have proven to be powerful in many graph-based applications. However, they fail to generalize well under heterophilic setups, where neighbor nodes have different labels. To address this challenge, we employ a confidence ratio as a hyper-parameter, assuming that some of the edges are disassortative (heterophilic). Here, we propose a two-phased algorithm. Firstly, we det… ▽ More

    Submitted 12 April, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

  27. arXiv:2302.06823  [pdf

    cond-mat.supr-con

    Proximity-induced quasi-one-dimensional superconducting quantum anomalous Hall state: a promising scalable top-down approach towards localized Majorana modes

    Authors: Omargeldi Atanov, Wai Ting Tai, Ying-Ming Xie, Yat Hei Ng, Molly A. Hammond, Tin Seng Manfred Ho, Tsin Hei Koo, Hui Li, Sui Lun Ho, Jian Lyu, Sukong Chong, Peng Zhang, Lixuan Tai, Jiannong Wang, Kam Tuen Law, Kang L. Wang, Rolf Lortz

    Abstract: In this work, ~100 nm wide quantum anomalous Hall insulator (QAHI) nanoribbons are etched from a two-dimensional QAHI film. One part of the nanoribbon is covered with superconducting Nb, while the other part is connected to an Au lead via two-dimensional QAHI regions. Andreev reflection spectroscopy measurements were performed, and multiple in-gap conductance peaks were observed in three different… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Journal ref: Cell Reports Physical Science 5, 101762 (2024)

  28. arXiv:2301.08918  [pdf, other

    cs.LG cs.SI

    Improving Signed Propagation of Graph Neural Network Under Multiple Classes

    Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

    Abstract: Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement… ▽ More

    Submitted 18 June, 2024; v1 submitted 21 January, 2023; originally announced January 2023.

  29. arXiv:2301.05163  [pdf, other

    cs.LG cs.AI

    Signed Directed Graph Contrastive Learning with Laplacian Augmentation

    Authors: Taewook Ko, Yoonhyuk Choi, Chong-Kwon Kim

    Abstract: Graph contrastive learning has become a powerful technique for several graph mining tasks. It learns discriminative representation from different perspectives of augmented graphs. Ubiquitous in our daily life, singed-directed graphs are the most complex and tricky to analyze among various graph types. That is why singed-directed graph contrastive learning has not been studied much yet, while there… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Pre-prints

  30. arXiv:2301.04412  [pdf, ps, other

    stat.ME stat.CO

    RobustIV and controlfunctionIV: Causal Inference for Linear and Nonlinear Models with Invalid Instrumental Variables

    Authors: Taehyeon Koo, You** Lee, Dylan S. Small, Zijian Guo

    Abstract: We present R software packages RobustIV and controlfunctionIV for causal inference with possibly invalid instrumental variables. RobustIV focuses on the linear outcome model. It implements the two-stage hard thresholding method to select valid instrumental variables from a set of candidate instrumental variables and make inferences for the causal effect in both low- and high-dimensional settings.… ▽ More

    Submitted 20 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  31. arXiv:2212.03657  [pdf, other

    cs.CL cs.SD eess.AS

    M3ST: Mix at Three Levels for Speech Translation

    Authors: Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou

    Abstract: How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)? It's well known that data augmentation is an efficient method to improve performance for many tasks by enlarging the dataset. In this paper, we propose Mix at three levels for Speech Translation (M^3ST) method to increase the diversity of the augmented training corpus. Specifically, we conduct two phases of fine… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: Submitted to ICASSP 2023

  32. arXiv:2211.15398  [pdf, other

    cs.CV cs.LG

    Leveraging per Image-Token Consistency for Vision-Language Pre-training

    Authors: Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

    Abstract: Most existing vision-language pre-training (VLP) approaches adopt cross-modal masked language modeling (CMLM) to learn vision-language associations. However, we find that CMLM is insufficient for this purpose according to our observations: (1) Modality bias: a considerable amount of masked tokens in CMLM can be recovered with only the language information, ignoring the visual inputs. (2) Under-uti… ▽ More

    Submitted 2 September, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: Accepted by CVPR 2023

  33. arXiv:2211.15081  [pdf, other

    cs.LG cs.AI

    Perturb Initial Features: Generalization of Neural Networks Under Sparse Features for Semi-supervised Node Classification

    Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

    Abstract: Graph neural networks (GNNs) are commonly used in semi-supervised settings. Previous research has primarily focused on finding appropriate graph filters (e.g. aggregation methods) to perform well on both homophilic and heterophilic graphs. While these methods are effective, they can still suffer from the sparsity of node features, where the initial data contain few non-zero elements. This can lead… ▽ More

    Submitted 28 May, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

  34. Self-Similar solution of rotating eruptive outflows on its equatorial plane

    Authors: Takatoshi Ko, Kotaro Fujisawa, Toshikazu Shigeyama

    Abstract: We construct axisymmetric self-similar solutions of transonic outflows emanating from a point source including the effect of the rotation. The solutions are constructed exclusively on the equatorial plane. The features of solutions are determined by three parameters; the adiabatic index $γ$, the dimensionless coordinate of the transonic point, and the dimensionless azimuthal velocity at the transo… ▽ More

    Submitted 5 November, 2022; originally announced November 2022.

    Comments: 14 pages, 6 figures, 1 table. Accepted by ApJ

    Report number: RESCEU-21/22 MSC Class: 85-10

  35. arXiv:2210.16428  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

    Authors: Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

    Abstract: Audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In this work, inspired by inherent human multimodal perception, we propose visually-aware audio captioning, which makes use of visual information to help the description of ambiguous sound… ▽ More

    Submitted 28 May, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: INTERSPEECH 2023

  36. Personalized Dialogue Generation with Persona-Adaptive Attention

    Authors: Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

    Abstract: Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona. Unlike conventional dialogue generation, the persona-based dialogue needs to consider both dialogue context and persona, posing a challenge for coherent training. Specifically, this requires a delicate weight balance between context and persona. To achieve that, in this paper, we… ▽ More

    Submitted 9 January, 2024; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 8 pages, 3 figures Accepted by AAAI-2023

  37. arXiv:2210.04062  [pdf, other

    cs.SD eess.AS

    CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

    Authors: Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

    Abstract: Speech is the surface form of a finite set of phonetic units, which can be represented by discrete codes. We propose the Code BERT (CoBERT) approach for self-supervised speech representation learning. The idea is to convert an utterance to a sequence of discrete codes, and perform code representation learning, where we predict the code representations based on a masked view of the original speech… ▽ More

    Submitted 5 July, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Accepted by Interspeech 2023

  38. arXiv:2209.05494  [pdf

    cond-mat.mtrl-sci

    Reversibly controlled ternary polar states and ferroelectric bias promoted by boosting square-tensile-strain

    Authors: Jun Han Lee, Nguyen Xuan Duong, Min-Hyoung Jung, Hyun-Jae Lee, Ahyoung Kim, Youngki Yeo, Junhyung Kim, Gye-Hyeon Kim, Byeong-Gwan Cho, Jaegyu Kim, Furqan Ul Hassan Naqvi, Jong-Seong Bae, Jeehoon Kim, Chang Won Ahn, Young-Min Kim, Tae Kwon Song, Jae-Hyeon Ko, Tae-Yeong Koo, Changhee Sohn, Kibog Park, Chan-Ho Yang, Sang Mo Yang, Jun Hee Lee, Hu Young Jeong, Tae Heon Kim , et al. (1 additional authors not shown)

    Abstract: Interaction between dipoles often emerges intriguing physical phenomena, such as exchange bias in the magnetic heterostructures and magnetoelectric effect in multiferroics, which lead to advances in multifunctional heterostructures. However, the defect-dipole tends to be considered the undesired to deteriorate the electronic functionality. Here, we report deterministic switching between the ferroe… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: According to the Copyright Policy, the submission version (before peer-review and revision)

    Journal ref: Advanced Materials, 2205825 (2022)

  39. arXiv:2208.11511  [pdf, other

    cs.LG cs.AI

    A Graph Convolution for Signed Directed Graphs

    Authors: Taewook Ko, Chong-Kwon Kim

    Abstract: A signed directed graph is a graph with sign and direction information on the edges. Even though signed directed graphs are more informative than unsigned or undirected graphs, they are more complicated to analyze and have received less research attention. This paper investigates a spectral graph convolution model to fully utilize the information embedded in signed directed edges. We propose a nov… ▽ More

    Submitted 16 February, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Preprint version

  40. arXiv:2208.02189  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

    Authors: Qibing Bai, Tom Ko, Yu Zhang

    Abstract: In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: Accepted by INTERSPEECH 2022

  41. arXiv:2205.12756  [pdf, other

    cs.RO

    Development of a Stereo-Vision Based High-Throughput Robotic System for Mouse Tail Vein Injection

    Authors: Tianyi Ko, Koichi Nishiwaki, Koji Terada, Yusuke Tanaka, Shun Mitsumata, Ryuichi Katagiri, Taketo Junko, Naoshi Horiba, Hideyoshi Igata, Kazue Mizuno

    Abstract: In this paper, we present a robotic device for mouse tail vein injection. We propose a mouse holding mechanism to realize vein injection without anesthetizing the mouse, which consists of a tourniquet, vacuum port, and adaptive tail-end fixture. The position of the target vein in 3D space is reconstructed from a high-resolution stereo vision. The vein is detected by a simple but robust vein line d… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: accepted to ICRA2022 (7 pages, 11 figures, 2 tables)

  42. arXiv:2205.11772  [pdf

    cs.CV

    Multi-Augmentation for Efficient Visual Representation Learning for Self-supervised Pre-training

    Authors: Van-Nhiem Tran, Chi-En Huang, Shen-Hsuan Liu, Kai-Lin Yang, Timothy Ko, Yung-Hui Li

    Abstract: In recent years, self-supervised learning has been studied to deal with the limitation of available labeled-dataset. Among the major components of self-supervised learning, the data augmentation pipeline is one key factor in enhancing the resulting performance. However, most researchers manually designed the augmentation pipeline, and the limited collections of transformation may cause the lack of… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  43. arXiv:2205.08993  [pdf, other

    cs.CL eess.AS

    Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

    Authors: Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

    Abstract: Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently. The task is very challenging due to data scarcity and complex speech-to-speech map**. In this paper, we report our recent achievements in S2ST. Firstly, we build a S2ST Transformer baseline which outperforms the original Translatotron. Secondly, we utilize the external data by pseudo-labeling and obtain a new… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: Submitted to INTERSPEECH 2022

  44. arXiv:2204.03939  [pdf, ps, other

    cs.CL cs.SD eess.AS

    GigaST: A 10,000-hour Pseudo Speech Translation Corpus

    Authors: Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

    Abstract: This paper introduces GigaST, a large-scale pseudo speech translation (ST) corpus. We create the corpus by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set is translated by human. ST models trained with an addition of our corpus obtain new state-of-the-art results on the MuST-C… ▽ More

    Submitted 6 June, 2023; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at Interspeech 2023. GigaST dataset is available at https://st-benchmark.github.io/resources/GigaST

  45. arXiv:2203.17113  [pdf, other

    cs.SD cs.LG eess.AS

    Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

    Authors: Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, **yu Li, Yao Qian, Furu Wei

    Abstract: This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language mode… ▽ More

    Submitted 20 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by Interspeech 2022

  46. arXiv:2203.15610  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

    Authors: Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

    Abstract: Self-supervised speech representation learning has shown promising results in various speech processing tasks. However, the pre-trained models, e.g., HuBERT, are storage-intensive Transformers, limiting their scope of applications under low-resource settings. To this end, we propose LightHuBERT, a once-for-all Transformer compression framework, to find the desired architectures automatically by pr… ▽ More

    Submitted 18 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, accepted to Insterspeech 2022

  47. arXiv:2203.10973  [pdf, ps, other

    cs.LG math.NA stat.ML

    A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima

    Authors: Taehee Ko, Xiantao Li

    Abstract: Loss functions with non-isolated minima have emerged in several machine learning problems, creating a gap between theory and practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such condition is general enough to encompass many existing conditions. In addition we study the l… ▽ More

    Submitted 30 May, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

  48. arXiv:2112.14909  [pdf, other

    astro-ph.SR astro-ph.HE

    Eruption of the Envelope of Massive Stars by Energy Injection with Finite Duration

    Authors: Takatoshi Ko, Daichi Tsuna, Yuki Takei, Toshikazu Shigeyama

    Abstract: A significant fraction of supernovae show signatures of dense circumstellar material (CSM). While multiple scenarios for creating a dense CSM exist, mass eruption due to injection of energy at the base of the outer envelope is a likely possibility. We carry out radiation hydrodynamical simulations of eruptive mass loss from a typical red supergiant progenitor with initial mass of $15\ M_\odot$, fo… ▽ More

    Submitted 14 April, 2022; v1 submitted 29 December, 2021; originally announced December 2021.

    Comments: 9 pages, 6 figures, 2 tables, Accepted for publication in ApJ

    Report number: RESCEU-25/21

  49. Review-Based Domain Disentanglement without Duplicate Users or Contexts for Cross-Domain Recommendation

    Authors: Yoonhyuk Choi, Jiho Choi, Taewook Ko, Hyungho Byun, Chong-Kwon Kim

    Abstract: A cross-domain recommendation has shown promising results in solving data-sparsity and cold-start problems. Despite such progress, existing methods focus on domain-shareable information (overlapped users or same contexts) for a knowledge transfer, and they fail to generalize well without such requirements. To deal with these problems, we suggest utilizing review texts that are general to most e-co… ▽ More

    Submitted 12 April, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

  50. arXiv:2110.07205  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

    Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, **yu Li, Furu Wei

    Abstract: Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After prepro… ▽ More

    Submitted 24 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted by ACL 2022 main conference