Skip to main content

Showing 1–50 of 96 results for author: Gong, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  2. arXiv:2406.09229  [pdf, other

    cs.CV

    MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

    Authors: Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

    Abstract: Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruct… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by 2024 IEEE International Conference on Image Processing

  3. arXiv:2406.02733  [pdf, other

    cs.CL cs.SD eess.AS

    Textless Acoustic Model with Self-Supervised Distillation for Noise-Robust Expressive Speech-to-Speech Translation

    Authors: Min-Jae Hwang, Ilia Kulikov, Benjamin Peloquin, Hongyu Gong, Peng-Jen Chen, Ann Lee

    Abstract: In this paper, we propose a textless acoustic model with a self-supervised distillation strategy for noise-robust expressive speech-to-speech translation (S2ST). Recently proposed expressive S2ST systems have achieved impressive expressivity preservation performances by cascading unit-to-speech (U2S) generator to the speech-to-unit translation model. However, these systems are vulnerable to the pr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (findings)

  4. arXiv:2406.02075  [pdf, other

    cs.LG cs.NE

    ReLU-KAN: New Kolmogorov-Arnold Networks that Only Need Matrix Addition, Dot Multiplication, and ReLU

    Authors: Qi Qiu, Tao Zhu, Helin Gong, Liming Chen, Huansheng Ning

    Abstract: Limited by the complexity of basis function (B-spline) calculations, Kolmogorov-Arnold Networks (KAN) suffer from restricted parallel computing capability on GPUs. This paper proposes a novel ReLU-KAN implementation that inherits the core idea of KAN. By adopting ReLU (Rectified Linear Unit) and point-wise multiplication, we simplify the design of KAN's basis function and optimize the computation… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.20410  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought

    Authors: Hongyu Gong, Bandhav Veluri

    Abstract: Expressive speech-to-speech translation (S2ST) is a key research topic in seamless communication, which focuses on the preservation of semantics and speaker vocal style in translated speech. Early works synthesized speaker style aligned speech in order to directly learn the map** from speech to target speech spectrogram. Without reliance on style aligned data, recent studies leverage the advance… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  6. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, **gxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  7. arXiv:2404.00552  [pdf, other

    cs.CV

    Comparison of Methods in Skin Pigment Decomposition

    Authors: Hao Gong, Michel Desvignes

    Abstract: Decomposition of skin pigment plays an important role in medical fields. Human skin can be decomposed into two primitive components, hemoglobin and melanin. It is our goal to apply these results for diagnosis of skin cancer. In this paper, various methods for skin pigment decomposition are reviewed comparatively and the performance of each method is evaluated both theoretically and experimentally.… ▽ More

    Submitted 7 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: 5 pages, 7 figures

  8. arXiv:2403.12408  [pdf, other

    cs.CL cs.SD eess.AS

    MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation

    Authors: Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

    Abstract: There have been emerging research interest and advances in speech-to-speech translation (S2ST), translating utterances from one language to another. This work proposes Multitask Speech Language Model (MSLM), which is a decoder-only speech language model trained in a multitask setting. Without reliance on text training data, our model is able to support multilingual S2ST with speaker style preserve… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  9. arXiv:2403.12402  [pdf, other

    cs.CL cs.SD eess.AS

    An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis

    Authors: Yifan Peng, Ilia Kulikov, Yilin Yang, Sravya Popuri, Hui Lu, Changhan Wang, Hongyu Gong

    Abstract: Speech language models (LMs) are promising for high-quality speech synthesis through in-context learning. A typical speech LM takes discrete semantic units as content and a short utterance as prompt, and synthesizes speech which preserves the content's semantics but mimics the prompt's style. However, there is no systematic understanding on how the synthesized audio is controlled by the prompt and… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  10. arXiv:2402.18107  [pdf, other

    cs.MM

    Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

    Authors: HongLin Gong, Mengzhao Jia, Liqiang **g

    Abstract: In line with the latest research, the task of identifying helpful reviews from a vast pool of user-generated textual and visual data has become a prominent area of study. Effective modal representations are expected to possess two key attributes: consistency and differentiation. Current methods designed for Multimodal Review Helpfulness Prediction (MRHP) face limitations in capturing distinctive i… ▽ More

    Submitted 25 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 10 pages,4 figures, 4 tables

  11. arXiv:2402.13040  [pdf, other

    cs.LG cs.AI cs.CE cs.CL q-bio.BM

    Text-Guided Molecule Generation with Diffusion Language Model

    Authors: Haisong Gong, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of au… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAAI

  12. arXiv:2402.13028  [pdf, other

    cs.CL cs.AI

    Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables

    Authors: Haisong Gong, Weizhi Xu, Shu wu, Qiang Liu, Liang Wang

    Abstract: Fact checking aims to predict claim veracity by reasoning over multiple evidence pieces. It usually involves evidence retrieval and veracity reasoning. In this paper, we focus on the latter, reasoning over unstructured text and structured table information. Previous works have primarily relied on fine-tuning pretrained language models or training homogeneous-graph-based models. Despite their effec… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAAI

  13. arXiv:2402.07659  [pdf, other

    cs.IR

    Multi-Behavior Collaborative Filtering with Partial Order Graph Convolutional Networks

    Authors: Yijie Zhang, Yuanchen Bei, Hao Chen, Qijie Shen, Zheng Yuan, Huan Gong, Senzhang Wang, Feiran Huang, Xiao Huang

    Abstract: Representing information of multiple behaviors in the single graph collaborative filtering (CF) vector has been a long-standing challenge. This is because different behaviors naturally form separate behavior graphs and learn separate CF embeddings. Existing models merge the separate embeddings by appointing the CF embeddings for some behaviors as the primary embedding and utilizing other auxiliari… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted by KDD2024

  14. arXiv:2402.03526  [pdf, other

    cs.CV

    nnMamba: 3D Biomedical Image Segmentation, Classification and Landmark Detection with State Space Model

    Authors: Haifan Gong, Luoyao Kang, Yitao Wang, Xiang Wan, Haofeng Li

    Abstract: In the field of biomedical image analysis, the quest for architectures capable of effectively capturing long-range dependencies is paramount, especially when dealing with 3D image segmentation, classification, and landmark detection. Traditional Convolutional Neural Networks (CNNs) struggle with locality respective field, and Transformers have a heavy computational load when applied to high-dimens… ▽ More

    Submitted 10 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/lhaof/nnMamba

  15. arXiv:2401.15911  [pdf, ps, other

    cs.AI math.ST

    Distribution-consistency Structural Causal Models

    Authors: Heyang Gong, Chaochao Lu, Yu Zhang

    Abstract: In the field of causal modeling, potential outcomes (PO) and structural causal models (SCMs) stand as the predominant frameworks. However, these frameworks face notable challenges in practically modeling counterfactuals, formalized as parameters of the joint distribution of potential outcomes. Counterfactual reasoning holds paramount importance in contemporary decision-making processes, especially… ▽ More

    Submitted 22 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  16. arXiv:2401.15867  [pdf, ps, other

    cs.IT

    An Information Aggregation Operator

    Authors: Heyang Gong

    Abstract: This study explores a new mathematical operator, symbolized as $\cupplus$, for information aggregation, aimed at enhancing traditional methods by directly amalgamating probability distributions. This operator facilitates the combination of probability densities, contributing a nuanced approach to probabilistic analysis. We apply this operator to a personalized incentive scenario, illustrating its… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  17. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  18. arXiv:2310.14172  [pdf, other

    eess.IV cs.CV

    ASC: Appearance and Structure Consistency for Unsupervised Domain Adaptation in Fetal Brain MRI Segmentation

    Authors: Zihang Xu, Haifan Gong, Xiang Wan, Haofeng Li

    Abstract: Automatic tissue segmentation of fetal brain images is essential for the quantitative analysis of prenatal neurodevelopment. However, producing voxel-level annotations of fetal brain imaging is time-consuming and expensive. To reduce labeling costs, we propose a practical unsupervised domain adaptation (UDA) setting that adapts the segmentation labels of high-quality fetal brain atlases to unlabel… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023, released code: https://github.com/lhaof/ASC

  19. arXiv:2310.14158  [pdf, other

    cs.CV

    Visual-Attribute Prompt Learning for Progressive Mild Cognitive Impairment Prediction

    Authors: Luoyao Kang, Haifan Gong, Xiang Wan, Haofeng Li

    Abstract: Deep learning (DL) has been used in the automatic diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD) with brain imaging data. However, previous methods have not fully exploited the relation between brain image and clinical information that is widely adopted by experts in practice. To exploit the heterogeneous features from imaging and tabular data simultaneously, we propose… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023, released code: https://github.com/lhaof/VAPL

  20. arXiv:2309.16207  [pdf, other

    cs.CV

    Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks

    Authors: Huihui Gong, Min**g Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu

    Abstract: Adversarial training serves as one of the most popular and effective methods to defend against adversarial perturbations. However, most defense mechanisms only consider a single type of perturbation while various attack methods might be adopted to perform stronger adversarial attacks against the deployed model in real-world scenarios, e.g., $\ell_2$ or $\ell_\infty$. Defending against various atta… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 9 pages, 2 figures

  21. arXiv:2309.09480  [pdf, other

    cs.CV

    Stealthy Physical Masked Face Recognition Attack via Adversarial Style Optimization

    Authors: Huihui Gong, Min**g Dong, Siqi Ma, Seyit Camtepe, Surya Nepal, Chang Xu

    Abstract: Deep neural networks (DNNs) have achieved state-of-the-art performance on face recognition (FR) tasks in the last decade. In real scenarios, the deployment of DNNs requires taking various face accessories into consideration, like glasses, hats, and masks. In the COVID-19 pandemic era, wearing face masks is one of the most effective ways to defend against the novel coronavirus. However, DNNs are kn… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 11 pages, 7 figures

  22. arXiv:2309.09464  [pdf, ps, other

    cs.CV cs.LG

    Reducing Adversarial Training Cost with Gradient Approximation

    Authors: Huihui Gong

    Abstract: Deep learning models have achieved state-of-the-art performances in various domains, while they are vulnerable to the inputs with well-crafted but small perturbations, which are named after adversarial examples (AEs). Among many strategies to improve the model robustness against AEs, Projected Gradient Descent (PGD) based adversarial training is one of the most effective methods. Unfortunately, th… ▽ More

    Submitted 10 October, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: The experiments are insufficient, later will be updated. Withraw this manuscript

  23. arXiv:2309.09323  [pdf, other

    cs.AI cs.LG stat.ME

    Answering Causal Queries at Layer 3 with DiscoSCMs-Embracing Heterogeneity

    Authors: Heyang Gong

    Abstract: In the realm of causal inference, Potential Outcomes (PO) and Structural Causal Models (SCM) are recognized as the principal frameworks.However, when it comes to Layer 3 valuations -- counterfactual queries deeply entwined with individual-level semantics -- both frameworks encounter limitations due to the degenerative issues brought forth by the consistency rule. This paper advocates for the Distr… ▽ More

    Submitted 8 February, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

  24. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  25. arXiv:2308.01947  [pdf, ps, other

    cs.LG cs.AI

    Discriminative Graph-level Anomaly Detection via Dual-students-teacher Model

    Authors: Fu Lin, Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Zitong Wang, Haonan Gong

    Abstract: Different from the current node-level anomaly detection task, the goal of graph-level anomaly detection is to find abnormal graphs that significantly differ from others in a graph set. Due to the scarcity of research on the work of graph-level anomaly detection, the detailed description of graph-level anomaly is insufficient. Furthermore, existing works focus on capturing anomalous graph informati… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  26. Multi-representations Space Separation based Graph-level Anomaly-aware Detection

    Authors: Fu Lin, Haonan Gong, Mingkang Li, Zitong Wang, Yue Zhang, Xuexiong Luo

    Abstract: Graph structure patterns are widely used to model different area data recently. How to detect anomalous graph information on these graph data has become a popular research problem. The objective of this research is centered on the particular issue that how to detect abnormal graphs within a graph set. The previous works have observed that abnormal graphs mainly show node-level and graph-level anom… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 11 pages, 12 figures

    Journal ref: 35th International Conference on Scientific and Statistical Database Management (SSDBM 2023), July 10--12, 2023, Los Angeles, CA, USA

  27. arXiv:2307.08655  [pdf, other

    cs.CL cs.SD eess.AS

    Multilingual Speech-to-Speech Translation into Multiple Target Languages

    Authors: Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

    Abstract: Speech-to-speech translation (S2ST) enables spoken communication between people talking in different languages. Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i.e., the translation from multiple source languages to one target language. We present the first work on multilingual S2ST supporting multiple target languages. Leveraging recent advance i… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

  28. arXiv:2306.01084  [pdf, other

    cs.SD eess.AS

    Exploration on HuBERT with Multiple Resolutions

    Authors: Jiatong Shi, Yun Tang, Hirofumi Inaguma, Hongyu GOng, Juan Pino, Shinji Watanabe

    Abstract: Hidden-unit BERT (HuBERT) is a widely-used self-supervised learning (SSL) model in speech processing. However, we argue that its fixed 20ms resolution for hidden representations would not be optimal for various speech-processing tasks since their attributes (e.g., speaker characteristics and semantics) are based on different time scales. To address this limitation, we propose utilizing HuBERT repr… ▽ More

    Submitted 22 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech2023

  29. arXiv:2304.04614  [pdf, other

    cs.CV

    HST-MRF: Heterogeneous Swin Transformer with Multi-Receptive Field for Medical Image Segmentation

    Authors: Xiaofei Huang, Hongfang Gong, ** Zhang

    Abstract: The Transformer has been successfully used in medical image segmentation due to its excellent long-range modeling capabilities. However, patch segmentation is necessary when building a Transformer class model. This process may disrupt the tissue structure in medical images, resulting in the loss of relevant information. In this study, we proposed a Heterogeneous Swin Transformer with Multi-Recepti… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

  30. arXiv:2303.08455   

    cs.LG physics.data-an

    On the uncertainty analysis of the data-enabled physics-informed neural network for solving neutron diffusion eigenvalue problem

    Authors: Yu Yang, Helin Gong, Qihong Yang, Yangtao Deng, Qiaolin He, Shiquan Zhang

    Abstract: In practical engineering experiments, the data obtained through detectors are inevitably noisy. For the already proposed data-enabled physics-informed neural network (DEPINN) \citep{DEPINN}, we investigate the performance of DEPINN in calculating the neutron diffusion eigenvalue problem from several perspectives when the prior data contain different scales of noise. Further, in order to reduce the… ▽ More

    Submitted 17 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: The experiments in Figures 6 and 10 in the article have errors that need to be corrected. Moreover, we intend to make massive changes to the content of the article, and therefore need to withdraw the article

  31. STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training

    Authors: Weihong Zhong, Mao Zheng, Duyu Tang, Xuan Luo, Heng Gong, Xiaocheng Feng, Bing Qin

    Abstract: Although large-scale video-language pre-training models, which usually build a global alignment between the video and the text, have achieved remarkable progress on various downstream tasks, the idea of adopting fine-grained information during the pre-training stage is not well explored. In this work, we propose STOA-VLP, a pre-training framework that jointly models object and action information a… ▽ More

    Submitted 23 May, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: AAAI 2023, 7 pages, 3 figures

  32. arXiv:2301.11716  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Pre-training for Speech Translation: CTC Meets Optimal Transport

    Authors: Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

    Abstract: The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC)… ▽ More

    Submitted 5 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formatting

  33. arXiv:2301.10663  [pdf

    cs.LG cs.AI

    Transfer Learning in Deep Learning Models for Building Load Forecasting: Case of Limited Data

    Authors: Menna Nawar, Moustafa Shomer, Samy Faddel, Huangjie Gong

    Abstract: Precise load forecasting in buildings could increase the bill savings potential and facilitate optimized strategies for power generation planning. With the rapid evolution of computer science, data-driven techniques, in particular the Deep Learning models, have become a promising solution for the load forecasting problem. These models have showed accurate forecasting results; however, they need ab… ▽ More

    Submitted 27 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Accepted paper for IEEE SoutheastCon, 2023

  34. arXiv:2301.10606  [pdf, other

    cs.CL cs.SD eess.AS

    A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

    Authors: Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

    Abstract: Expressive speech-to-speech translation (S2ST) aims to transfer prosodic attributes of source speech to target speech while maintaining translation accuracy. Existing research in expressive S2ST is limited, typically focusing on a single expressivity aspect at a time. Likewise, this research area lacks standard evaluation protocols and well-curated benchmark datasets. In this work, we propose a ho… ▽ More

    Submitted 25 January, 2023; originally announced January 2023.

    Comments: This is the full version of our submission to ICASSP 2023

  35. arXiv:2212.08307  [pdf, other

    cs.CL

    Controllable Text Generation via Probability Density Estimation in the Latent Space

    Authors: Yuxuan Gu, Xiaocheng Feng, Sicheng Ma, Lingyuan Zhang, Heng Gong, Weihong Zhong, Bing Qin

    Abstract: Previous work on controllable text generation has explored the idea of control from the latent space, such as optimizing a representation with attribute-related classifiers or sampling a representation from relevant discrete samples. However, they are not effective enough in modeling both the latent space and the control, leaving controlled text with low quality and diversity. In this work, we pro… ▽ More

    Submitted 24 May, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: 25 pages, 9 figures, Accepted to ACL2023

  36. arXiv:2211.15320  [pdf, other

    cs.CV

    RankDNN: Learning to Rank for Few-shot Learning

    Authors: Qianyu Guo, Hongtong Gong, Xujun Wei, Yanwei Fu, Weifeng Ge, Yizhou Yu, Wenqiang Zhang

    Abstract: This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep… ▽ More

    Submitted 29 November, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 12 pages, 4 figures. Accepted to AAAI2023. The code is available at: https://github.com/guoqianyu-alberta/RankDNN

  37. arXiv:2211.10226  [pdf, other

    cs.CV cs.AI

    Leveraging Multi-stream Information Fusion for Trajectory Prediction in Low-illumination Scenarios: A Multi-channel Graph Convolutional Approach

    Authors: Hailong Gong, Zirui Li, Chao Lu, Guodong Du, Jianwei Gong

    Abstract: Trajectory prediction is a fundamental problem and challenge for autonomous vehicles. Early works mainly focused on designing complicated architectures for deep-learning-based prediction models in normal-illumination environments, which fail in dealing with low-light conditions. This paper proposes a novel approach for trajectory prediction in low-illumination scenarios by leveraging multi-stream… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  38. arXiv:2211.06474  [pdf, other

    cs.CL cs.SD eess.AS

    Speech-to-Speech Translation For A Real-world Unwritten Language

    Authors: Peng-Jen Chen, Kevin Tran, Yilin Yang, **gfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

    Abstract: We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  39. arXiv:2211.04508  [pdf, other

    cs.CL cs.SD eess.AS

    SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations

    Authors: Paul-Ambroise Duquenne, Hongyu Gong, Ning Dong, **gfei Du, Ann Lee, Vedanuj Goswani, Changhan Wang, Juan Pino, Benoît Sagot, Holger Schwenk

    Abstract: We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive basel… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 18 pages

  40. arXiv:2210.14514  [pdf, other

    cs.CL cs.SD eess.AS

    Improving Speech-to-Speech Translation Through Unlabeled Text

    Authors: Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong

    Abstract: Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data. While effort has been made to increase the data size from unlabeled speech by cascading pretrained speech recognition (ASR), machine translation (MT) and text-to-speech (TTS) models; unlabeled text has remained relatively under-utilized to impr… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  41. arXiv:2210.11981  [pdf, other

    cs.CL

    Named Entity Detection and Injection for Direct Speech Translation

    Authors: Marco Gaido, Yun Tang, Ilia Kulikov, Rongqing Huang, Hongyu Gong, Hirofumi Inaguma

    Abstract: In a sentence, certain words are critical for its semantic. Among them, named entities (NEs) are notoriously challenging for neural models. Despite their importance, their accurate handling has been neglected in speech-to-text (S2T) translation research, and recent work has shown that S2T models perform poorly for locations and notably person names, whose spelling is challenging unless known in ad… ▽ More

    Submitted 11 March, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: \c{opyright} 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  42. arXiv:2210.02889  [pdf, other

    cs.CL

    A Distributional Lens for Multi-Aspect Controllable Text Generation

    Authors: Yuxuan Gu, Xiaocheng Feng, Sicheng Ma, Lingyuan Zhang, Heng Gong, Bing Qin

    Abstract: Multi-aspect controllable text generation is a more challenging and practical task than single-aspect control. Existing methods achieve complex multi-aspect control by fusing multiple controllers learned from single-aspect, but suffer from attribute degeneration caused by the mutual interference of these controllers. To address this, we provide observations on attribute fusion from a distributiona… ▽ More

    Submitted 19 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 21pages, 21figures, EMNLP2022

  43. arXiv:2210.00220  [pdf, other

    cs.CV

    A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering

    Authors: Xiaofei Huang, Hongfang Gong

    Abstract: Research in medical visual question answering (MVQA) can contribute to the development of computeraided diagnosis. MVQA is a task that aims to predict accurate and convincing answers based on given medical images and associated natural language questions. This task requires extracting medical knowledge-rich feature content and making fine-grained understandings of them. Therefore, constructing an… ▽ More

    Submitted 11 November, 2022; v1 submitted 1 October, 2022; originally announced October 2022.

  44. arXiv:2209.04128  [pdf, other

    cs.RO

    Modelling Power Consumptions for Multi-rotor UAVs

    Authors: Hao Gong, Baoqi Huang, Bing Jia, Hansu Dai

    Abstract: Unmanned aerial vehicles (UAVs) have various advantages, but their practical applications are influenced by their limited energy. Therefore, it is important to manage their power consumption and also important to establish corresponding power consumption models. However, most of existing works either establish theoretical power consumption models for fixed-wing UAVs and single-rotor UAVs, or provi… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  45. arXiv:2209.00884  [pdf, other

    physics.ins-det cs.AR cs.LG

    PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals

    Authors: Pengcheng Ai, Zhi Deng, Yi Wang, Hui Gong, Xinchi Ran, Zijian Lang

    Abstract: Front-end electronics equipped with high-speed digitizers are being used and proposed for future nuclear detectors. Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors. Simulations and experiments demonstrate the satisfactory accuracy and additional benefits of neural n… ▽ More

    Submitted 8 February, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE Transactions on Nuclear Science

  46. arXiv:2207.00807  [pdf, other

    cs.CV

    Less is More: Adaptive Curriculum Learning for Thyroid Nodule Diagnosis

    Authors: Haifan Gong, Hui Cheng, Yifan Xie, Shuangyi Tan, Guanqi Chen, Fei Chen, Guanbin Li

    Abstract: Thyroid nodule classification aims at determining whether the nodule is benign or malignant based on a given ultrasound image. However, the label obtained by the cytological biopsy which is the golden standard in clinical medicine is not always consistent with the ultrasound imaging TI-RADS criteria. The information difference between the two causes the existing deep learning-based classification… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: Accepted to MICCAI 2022 with Student Travel Award

  47. arXiv:2205.12216  [pdf, other

    cs.CL

    T-Modules: Translation Modules for Zero-Shot Cross-Modal Machine Translation

    Authors: Paul-Ambroise Duquenne, Hongyu Gong, Benoît Sagot, Holger Schwenk

    Abstract: We present a new approach to perform zero-shot cross-modal transfer between speech and text for translation tasks. Multilingual speech and text are encoded in a joint fixed-size representation space. Then, we compare different approaches to decode these multimodal and multilingual fixed-size representations, enabling zero-shot translation between languages and modalities. All our models are traine… ▽ More

    Submitted 10 November, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

  48. arXiv:2205.06947  [pdf, other

    eess.IV cs.CV

    BronchusNet: Region and Structure Prior Embedded Representation Learning for Bronchus Segmentation and Classification

    Authors: Wenhao Huang, Haifan Gong, Huan Zhang, Yu Wang, Haofeng Li, Guanbin Li, Hong Shen

    Abstract: CT-based bronchial tree analysis plays an important role in the computer-aided diagnosis for respiratory diseases, as it could provide structured information for clinicians. The basis of airway analysis is bronchial tree reconstruction, which consists of bronchus segmentation and classification. However, there remains a challenge for accurate bronchial analysis due to the individual variations and… ▽ More

    Submitted 23 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

  49. arXiv:2204.05409  [pdf, other

    cs.CL

    Unified Speech-Text Pre-training for Speech Translation and Recognition

    Authors: Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino

    Abstract: We describe a method to jointly pre-train speech and text in an encoder-decoder modeling framework for speech translation and recognition. The proposed method incorporates four self-supervised and supervised subtasks for cross modality learning. A self-supervised speech subtask leverages unlabelled speech data, and a (self-)supervised text to text subtask makes use of abundant text training data.… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: ACL 2022 main conference

  50. arXiv:2203.06850  [pdf, other

    cs.CL cs.AI

    Efficient Language Modeling with Sparse all-MLP

    Authors: ** Yu, Mikel Artetxe, Myle Ott, Sam Shleifer, Hongyu Gong, Ves Stoyanov, Xian Li

    Abstract: All-MLP architectures have attracted increasing interest as an alternative to attention-based models. In NLP, recent work like gMLP shows that all-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks. In this work, we analyze the limitations of MLPs in expressiveness, and propose sparsely activated MLPs with mixture-of-experts (MoEs) in both feature and input… ▽ More

    Submitted 31 May, 2022; v1 submitted 14 March, 2022; originally announced March 2022.