Search | arXiv e-print repository

arXiv:2407.01926 [pdf]

Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior

Authors: Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

Abstract: This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s… ▽ More This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results showed close alignment between measured and reference double bond values, with phantom experiments yielding a Pearson correlation coefficient of 0.96 (p = .0005). In vivo results demonstrated good agreement in subcutaneous fat. We conclude that Deep Image Prior shows feasibility for quantifying double bonds and fatty acid content from chemical-shift encoded multi-echo MRI. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.17102 [pdf, other]

DINO-SD: Champion Solution for ICRA 2024 RoboDepth Challenge

Authors: Yifan Mao, Ming Li, Jian Liu, Jiayang Liu, Zihan Qin, Chunxi Chu, Jialei Xu, Wenbo Zhao, Junjun Jiang, Xianming Liu

Abstract: Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution… ▽ More Surround-view depth estimation is a crucial task aims to acquire the depth maps of the surrounding views. It has many applications in real world scenarios such as autonomous driving, AR/VR and 3D reconstruction, etc. However, given that most of the data in the autonomous driving dataset is collected in daytime scenarios, this leads to poor depth model performance in the face of out-of-distribution(OoD) data. While some works try to improve the robustness of depth model under OoD data, these methods either require additional training data or lake generalizability. In this report, we introduce the DINO-SD, a novel surround-view depth estimation model. Our DINO-SD does not need additional data and has strong robustness. Our DINO-SD get the best performance in the track4 of ICRA 2024 RoboDepth Challenge. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Outstanding Champion in the RoboDepth Challenge (ICRA24) https://robodrive-24.github.io/

arXiv:2405.13233 [pdf, other]

MELD-ST: An Emotion-aware Speech Translation Dataset

Authors: Sirou Chen, Sakiko Yahata, Shuichiro Shimizu, Zhengdong Yang, Yihang Li, Chenhui Chu, Sadao Kurohashi

Abstract: Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline e… ▽ More Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 9 pages. Accepted to ACL 2024 Findings. Dataset: https://huggingface.co/datasets/ku-nlp/MELD-ST

arXiv:2405.11607 [pdf, other]

OFHE: An Electro-Optical Accelerator for Discretized TFHE

Authors: Mengxin Zheng, Cheng Chu, Qian Lou, Nathan Youngblood, Mo Li, Sajjad Moazeni, Lei Jiang

Abstract: This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrap**s. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomia… ▽ More This paper presents \textit{OFHE}, an electro-optical accelerator designed to process Discretized TFHE (DTFHE) operations, which encrypt multi-bit messages and support homomorphic multiplications, lookup table operations and full-domain functional bootstrap**s. While DTFHE is more efficient and versatile than other fully homomorphic encryption schemes, it requires 32-, 64-, and 128-bit polynomial multiplications, which can be time-consuming. Existing TFHE accelerators are not easily upgradable to support DTFHE operations due to limited datapaths, a lack of datapath bit-width reconfigurability, and power inefficiencies when processing FFT and inverse FFT (IFFT) kernels. Compared to prior TFHE accelerators, OFHE addresses these challenges by improving the DTFHE operation latency by 8.7\%, the DTFHE operation throughput by $57\%$, and the DTFHE operation throughput per Watt by $94\%$. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.08816 [pdf, other]

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

arXiv:2405.03141 [pdf, other]

Automatic Ultrasound Curve Angle Measurement via Affinity Clustering for Adolescent Idiopathic Scoliosis Evaluation

Authors: Yihao Zhou, Timothy Tin-Yan Lee, Kelly Ka-Lee Lai, Chonglin Wu, Hin Ting Lau, De Yang, Chui-Yi Chan, Winnie Chiu-Wing Chu, Jack Chun-Yiu Cheng, Tsz-** Lam, Yong-** Zheng

Abstract: The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of mea… ▽ More The current clinical gold standard for evaluating adolescent idiopathic scoliosis (AIS) is X-ray radiography, using Cobb angle measurement. However, the frequent monitoring of the AIS progression using X-rays poses a challenge due to the cumulative radiation exposure. Although 3D ultrasound has been validated as a reliable and radiation-free alternative for scoliosis assessment, the process of measuring spinal curvature is still carried out manually. Consequently, there is a considerable demand for a fully automatic system that can locate bony landmarks and perform angle measurements. To this end, we introduce an estimation model for automatic ultrasound curve angle (UCA) measurement. The model employs a dual-branch network to detect candidate landmarks and perform vertebra segmentation on ultrasound coronal images. An affinity clustering strategy is utilized within the vertebral segmentation area to illustrate the affinity relationship between candidate landmarks. Subsequently, we can efficiently perform line delineation from a clustered affinity map for UCA measurement. As our method is specifically designed for UCA calculation, this method outperforms other state-of-the-art methods for landmark and line detection tasks. The high correlation between the automatic UCA and Cobb angle (R$^2$=0.858) suggests that our proposed method can potentially replace manual UCA measurement in ultrasound scoliosis assessment. △ Less

Submitted 6 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2403.18052 [pdf, other]

R2D2 image reconstruction with model uncertainty quantification in radio astronomy

Authors: Amir Aghabiglou, Chung San Chu, Arwa Dabbech, Yves Wiaux

Abstract: The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we inve… ▽ More The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate ``R2D2 samples'', from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms; (ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension; (iii) it is characterized by a very low model uncertainty. △ Less

Submitted 27 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to IEEE EUSIPCO 2024

arXiv:2403.17905 [pdf, other]

Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2

Authors: Yiwei Chen, Chao Tang, Amir Aghabiglou, Chung San Chu, Yves Wiaux

Abstract: We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limita… ▽ More We propose a new approach for non-Cartesian magnetic resonance image reconstruction. While unrolled architectures provide robustness via data-consistency layers, embedding measurement operators in Deep Neural Network (DNN) can become impractical at large scale. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, are not affected by this limitation and have also proven effective, but their highly iterative nature also affects scalability. To address this scalability challenge, we leverage the "Residual-to-Residual DNN series for high-Dynamic range imaging (R2D2)" approach recently introduced in astronomical imaging. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of DNNs taking the previous iteration's image estimate and associated data residual as inputs. The method can be interpreted as a learned version of the Matching Pursuit algorithm. We demonstrate R2D2 in simulation, considering radial k-space sampling acquisition sequences. Our preliminary results suggest that R2D2 achieves: (i) suboptimal performance compared to its unrolled incarnation R2D2-Net, which is however non-scalable due to the necessary embedding of NUFFT-based data-consistency layers; (ii) superior reconstruction quality to a scalable version of R2D2-Net embedding an FFT-based approximation for data consistency; (iii) superior reconstruction quality to PnP, while only requiring few iterations. △ Less

Submitted 28 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted to IEEE EUSIPCO 2024

arXiv:2403.15765 [pdf, other]

Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents

Authors: Hao Wang, Tang Li, Chenhui Chu, Nengjun Zhu, Rui Wang, Pinpin Zhu

Abstract: Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related t… ▽ More Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, accepted by LERC-COLING2024

arXiv:2403.14654 [pdf, other]

doi 10.3389/fvets.2024.1395934

ChatGPT in Veterinary Medicine: A Practical Guidance of Generative Artificial Intelligence in Clinics, Education, and Research

Authors: Candice P. Chu

Abstract: ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guid… ▽ More ChatGPT, the most accessible generative artificial intelligence (AI) tool, offers considerable potential for veterinary medicine, yet a dedicated review of its specific applications is lacking. This review concisely synthesizes the latest research and practical applications of ChatGPT within the clinical, educational, and research domains of veterinary medicine. It intends to provide specific guidance and actionable examples of how generative AI can be directly utilized by veterinary professionals without a programming background. For practitioners, ChatGPT can extract patient data, generate progress notes, and potentially assist in diagnosing complex cases. Veterinary educators can create custom GPTs for student support, while students can utilize ChatGPT for exam preparation. ChatGPT can aid in academic writing tasks in research, but veterinary publishers have set specific requirements for authors to follow. Despite its transformative potential, careful use is essential to avoid pitfalls like hallucination. This review addresses ethical considerations, provides learning resources, and offers tangible examples to guide responsible implementation. Carefully selected, up-to-date links to platforms that host large language models are provided for advanced readers with programming capability. A table of key takeaways was provided to summarize this review. By highlighting potential benefits and limitations, this review equips veterinarians, educators, and researchers to harness the power of ChatGPT effectively. △ Less

Submitted 25 February, 2024; originally announced March 2024.

arXiv:2403.10790 [pdf, other]

QuantumLeak: Stealing Quantum Neural Networks from Cloud-based NISQ Machines

Authors: Zhenxiao Fu, Min Yang, Cheng Chu, Yilun Xu, Gang Huang, Fan Chen

Abstract: Variational quantum circuits (VQCs) have become a powerful tool for implementing Quantum Neural Networks (QNNs), addressing a wide range of complex problems. Well-trained VQCs serve as valuable intellectual assets hosted on cloud-based Noisy Intermediate Scale Quantum (NISQ) computers, making them susceptible to malicious VQC stealing attacks. However, traditional model extraction techniques desig… ▽ More Variational quantum circuits (VQCs) have become a powerful tool for implementing Quantum Neural Networks (QNNs), addressing a wide range of complex problems. Well-trained VQCs serve as valuable intellectual assets hosted on cloud-based Noisy Intermediate Scale Quantum (NISQ) computers, making them susceptible to malicious VQC stealing attacks. However, traditional model extraction techniques designed for classical machine learning models encounter challenges when applied to NISQ computers due to significant noise in current devices. In this paper, we introduce QuantumLeak, an effective and accurate QNN model extraction technique from cloud-based NISQ machines. Compared to existing classical model stealing techniques, QuantumLeak improves local VQC accuracy by 4.99\%$\sim$7.35\% across diverse datasets and VQC architectures. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Journal ref: published in IJCNN 2024

arXiv:2403.05452 [pdf, other]

The R2D2 deep neural network series paradigm for fast precision imaging in radio astronomy

Authors: Amir Aghabiglou, Chung San Chu, Arwa Dabbech, Yves Wiaux

Abstract: Radio-interferometric (RI) imaging entails solving high-resolution high-dynamic range inverse problems from large data volumes. Recent image reconstruction techniques grounded in optimization theory have demonstrated remarkable capability for imaging precision, well beyond CLEAN's capability. These range from advanced proximal algorithms propelled by handcrafted regularization operators, such as t… ▽ More Radio-interferometric (RI) imaging entails solving high-resolution high-dynamic range inverse problems from large data volumes. Recent image reconstruction techniques grounded in optimization theory have demonstrated remarkable capability for imaging precision, well beyond CLEAN's capability. These range from advanced proximal algorithms propelled by handcrafted regularization operators, such as the SARA family, to hybrid plug-and-play (PnP) algorithms propelled by learned regularization denoisers, such as AIRI. Optimization and PnP structures are however highly iterative, which hinders their ability to handle the extreme data sizes expected from future instruments. To address this scalability challenge, we introduce a novel deep learning approach, dubbed "Residual-to-Residual DNN series for high-Dynamic range imaging". R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. It thus takes a hybrid structure between a PnP algorithm and a learned version of the matching pursuit algorithm that underpins CLEAN. We present a comprehensive study of our approach, featuring its multiple incarnations distinguished by their DNN architectures. We provide a detailed description of its training process, targeting a telescope-specific approach. R2D2's capability to deliver high precision is demonstrated in simulation, across a variety of image and observation settings using the Very Large Array (VLA). Its reconstruction speed is also demonstrated: with only few iterations required to clean data residuals at dynamic ranges up to 100000, R2D2 opens the door to fast precision imaging. R2D2 codes are available in the BASPLib library on GitHub. △ Less

Submitted 1 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: Accepted for publication in ApJS

arXiv:2403.03690 [pdf]

Rapidly Develo** High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly develo** such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly develo** such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propose an efficient self-instruct method based on GPT-4. We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality. GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data. We also construct an evaluation benchmark containing 80 questions across 8 categories, using GPT-4 to automatically assess the response quality of LLMs without human references. The empirical results suggest that the models fine-tuned on our GPT-4 self-instruct data significantly outperformed the Japanese-Alpaca across all three base pre-trained models. Our GPT-4 self-instruct data allowed the LLaMA 13B model to defeat GPT-3.5 (Davinci-003) with a 54.37\% win-rate. The human evaluation exhibits the consistency between GPT-4's assessments and human preference. Our high-quality instruction data and evaluation benchmark have been released here. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}

arXiv:2403.00877 [pdf, other]

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

Authors: Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

Abstract: We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global… ▽ More We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated Multi-Tower (DMT), a modeling technique that consists of (1) Semantic-preserving Tower Transform (SPTT), a novel training paradigm that decomposes the monolithic global embedding lookup process into disjoint towers to exploit data center locality; (2) Tower Module (TM), a synergistic dense component attached to each tower to reduce model complexity and communication volume through hierarchical feature interaction; and (3) Tower Partitioner (TP), a feature partitioner to systematically create towers with meaningful feature interactions and load balanced assignments to preserve model quality and training throughput via learned embeddings. We show that DMT can achieve up to 1.9x speedup compared to the state-of-the-art baselines without losing accuracy across multiple generations of hardware at large data center scales. △ Less

Submitted 2 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.11021 [pdf, other]

TITAN: A Distributed Large-Scale Trapped-Ion NISQ Computer

Authors: Cheng Chu, Zhenxiao Fu, Yilun Xu, Gang Huang, Hausi Muller, Fan Chen, Lei Jiang

Abstract: Trapped-Ion (TI) technology offers potential breakthroughs for Noisy Intermediate Scale Quantum (NISQ) computing. TI qubits offer extended coherence times and high gate fidelity, making them appealing for large-scale NISQ computers. Constructing such computers demands a distributed architecture connecting Quantum Charge Coupled Devices (QCCDs) via quantum matter-links and photonic switches. Howeve… ▽ More Trapped-Ion (TI) technology offers potential breakthroughs for Noisy Intermediate Scale Quantum (NISQ) computing. TI qubits offer extended coherence times and high gate fidelity, making them appealing for large-scale NISQ computers. Constructing such computers demands a distributed architecture connecting Quantum Charge Coupled Devices (QCCDs) via quantum matter-links and photonic switches. However, current distributed TI NISQ computers face hardware and system challenges. Entangling qubits across a photonic switch introduces significant latency, while existing compilers generate suboptimal map**s due to their unawareness of the interconnection topology. In this paper, we introduce TITAN, a large-scale distributed TI NISQ computer, which employs an innovative photonic interconnection design to reduce entanglement latency and an advanced partitioning and map** algorithm to optimize matter-link communications. Our evaluations show that TITAN greatly enhances quantum application performance by 56.6% and fidelity by 19.7% compared to existing systems. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.06127 [pdf, other]

CityFlowER: An Efficient and Realistic Traffic Simulator with Embedded Machine Learning Models

Authors: Longchao Da, Chen Chu, Weinan Zhang, Hua Wei

Abstract: Traffic simulation is an essential tool for transportation infrastructure planning, intelligent traffic control policy learning, and traffic flow analysis. Its effectiveness relies heavily on the realism of the simulators used. Traditional traffic simulators, such as SUMO and CityFlow, are often limited by their reliance on rule-based models with hyperparameters that oversimplify driving behaviors… ▽ More Traffic simulation is an essential tool for transportation infrastructure planning, intelligent traffic control policy learning, and traffic flow analysis. Its effectiveness relies heavily on the realism of the simulators used. Traditional traffic simulators, such as SUMO and CityFlow, are often limited by their reliance on rule-based models with hyperparameters that oversimplify driving behaviors, resulting in unrealistic simulations. To enhance realism, some simulators have provided Application Programming Interfaces (APIs) to interact with Machine Learning (ML) models, which learn from observed data and offer more sophisticated driving behavior models. However, this approach faces challenges in scalability and time efficiency as vehicle numbers increase. Addressing these limitations, we introduce CityFlowER, an advancement over the existing CityFlow simulator, designed for efficient and realistic city-wide traffic simulation. CityFlowER innovatively pre-embeds ML models within the simulator, eliminating the need for external API interactions and enabling faster data computation. This approach allows for a blend of rule-based and ML behavior models for individual vehicles, offering unparalleled flexibility and efficiency, particularly in large-scale simulations. We provide detailed comparisons with existing simulators, implementation insights, and comprehensive experiments to demonstrate CityFlowER's superiority in terms of realism, efficiency, and adaptability. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: 4 pages, 4 figures

ACM Class: G.3

arXiv:2401.13601 [pdf, other]

MM-LLMs: Recent Advances in MultiModal Large Language Models

Authors: Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu

Abstract: In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive surve… ▽ More In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a real-time tracking website for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain. △ Less

Submitted 28 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted by ACL2024 (findings)

arXiv:2401.13249 [pdf, other]

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

Authors: Wang** Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

Abstract: Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection a… ▽ More Automatic Mean Opinion Score (MOS) prediction is employed to evaluate the quality of synthetic speech. This study extends the application of predicted MOS to the task of Fake Audio Detection (FAD), as we expect that MOS can be used to assess how close synthesized speech is to the natural human voice. We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion. In training data selection, we demonstrate that MOS enables effective filtering of samples from unbalanced datasets. In the model fusion, our results demonstrate that incorporating MOS as a gating mechanism in FAD model fusion enhances overall performance. △ Less

Submitted 24 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted in ICASSP2024

arXiv:2401.06129 [pdf, other]

Distilling Vision-Language Models on Millions of Videos

Authors: Yue Zhao, Long Zhao, Xingyi Zhou, Jialin Wu, Chun-Te Chu, Hui Miao, Florian Schroff, Hartwig Adam, Ting Liu, Boqing Gong, Philipp Krähenbühl, Liangzhe Yuan

Abstract: The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-i… ▽ More The recent advance in vision-language models is largely attributed to the abundance of image-text data. We aim to replicate this success for video-language models, but there simply is not enough human-curated video-text data available. We thus resort to fine-tuning a video-language model from a strong image-language baseline with synthesized instructional data. The resulting video model by video-instruction-tuning (VIIT) is then used to auto-label millions of videos to generate high-quality captions. We show the adapted video-language model performs well on a wide range of video-language benchmarks. For instance, it surpasses the best prior result on open-ended NExT-QA by 2.8%. Besides, our model generates detailed descriptions for previously unseen videos, which provide better textual supervision than existing methods. Experiments show that a video-language dual-encoder model contrastively trained on these auto-generated captions is 3.8% better than the strongest baseline that also leverages vision-language models. Our best model outperforms state-of-the-art methods on MSR-VTT zero-shot text-to-video retrieval by 6%. As a side product, we generate the largest video caption dataset to date. △ Less

Submitted 15 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: CVPR 2024. Project page: https://zhaoyue-zephyrus.github.io/video-instruction-tuning

arXiv:2312.15392 [pdf]

Blockchain Smart Contract Threat Detection Technology Based on Symbolic Execution

Authors: Chang Chu

Abstract: The security of smart contracts, which are an important part of blockchain technology, has attracted much attention. In particular, reentrancy vulnerability, which is hidden and complex, poses a great threat to smart contracts. In order to improve the existing detection methods, which exhibit low efficiency and accuracy, in this paper, we propose a smart contract threat detection technology based… ▽ More The security of smart contracts, which are an important part of blockchain technology, has attracted much attention. In particular, reentrancy vulnerability, which is hidden and complex, poses a great threat to smart contracts. In order to improve the existing detection methods, which exhibit low efficiency and accuracy, in this paper, we propose a smart contract threat detection technology based on symbolic execution. In this method, first, the recursive descent algorithm is used to recover the basic blocks of contract code and control flow diagram, and static type inference is performed for static single assignment (SSA) variables. Then, the control flow diagram is encoded into constrained horn clause (CHC) constraints in combination with the symbolic execution technology. Model checking is conducted for the generated constraints using an automatic theorem prover based on the abstraction refinement technique for fast static detection of common security threats in smart contracts. Compared with existing detection methods, the method proposed in this paper allows the detection of both the checks-effects-interactions pattern and the vulnerability in relation to reentrant locks. It can simulate the state changes of reentrant locks as well as other global variables in multiple recursive transactions. The experimental results show that this method significantly increases both detection efficiency and accuracy, improving the security of smart contracts. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2311.03696 [pdf, other]

Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts

Authors: Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, Sadao Kurohashi

Abstract: Lecture transcript translation helps learners understand online courses, however, building a high-quality lecture machine translation system lacks publicly available parallel corpora. To address this, we examine a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. To create the parallel corpora, we… ▽ More Lecture transcript translation helps learners understand online courses, however, building a high-quality lecture machine translation system lacks publicly available parallel corpora. To address this, we examine a framework for parallel corpus mining, which provides a quick and effective way to mine a parallel corpus from publicly available lectures on Coursera. To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences. The sentence alignment F1 score reaches 96%, which is higher than using the BERTScore, LASER, or sentBERT methods. For both English--Japanese and English--Chinese lecture translations, we extracted parallel corpora of approximately 50,000 lines and created development and test sets through manual filtering for benchmarking translation performance. Through machine translation experiments, we show that the mined corpora enhance the quality of lecture transcript translation when used in conjunction with out-of-domain parallel corpora via multistage fine-tuning. Furthermore, this study also suggests guidelines for gathering and cleaning corpora, mining parallel sentences, cleaning noise in the mined data, and creating high-quality evaluation splits. For the sake of reproducibility, we have released the corpora as well as the code to create them. The dataset is available at https://github.com/shyyhs/CourseraParallelCorpusMining. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Submitted to the Journal of Information Processing (JIP). arXiv admin note: text overlap with arXiv:1912.11739

arXiv:2310.20201 [pdf, other]

Video-Helpful Multimodal Machine Translation

Authors: Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li

Abstract: Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily c… ▽ More Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/video-helpful-MMT.git. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023 Main Conference (long paper)

arXiv:2310.14802 [pdf, other]

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Authors: Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang

Abstract: The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD d… ▽ More The use of visually-rich documents (VRDs) in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce \textsc{DocTrack}, a VRD dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progress, they still have a long way to go before they can read VRDs as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of Document AI models. The data is available at \url{https://github.com/hint-lab/doctrack}. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 14 pages, 8 figures, Accepted by Findings of EMNLP2023

arXiv:2310.14785 [pdf, other]

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning

Authors: Hao Wang, Xiahua Chen, Rui Wang, Chenhui Chu

Abstract: Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resu… ▽ More Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel \textbf{V}isually-\textbf{A}symmetric co\textbf{N}sisten\textbf{C}y \textbf{L}earning (\textsc{Vancl}) approach that addresses the above limitation by enhancing the model's ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: 14 pages, 6 figures, Accepted by EMNLP2023

arXiv:2309.16509 [pdf, other]

SIMD Everywhere Optimization from ARM NEON to RISC-V Vector Extensions

Authors: Ju-Hung Li, Jhih-Kuan Lin, Yung-Cheng Su, Chi-Wei Chu, Lai-Tak Kuok, Hung-Ming Lai, Chao-Lin Lee, Jenq-Kuen Lee

Abstract: Many libraries, such as OpenCV, FFmpeg, XNNPACK, and Eigen, utilize Arm or x86 SIMD Intrinsics to optimize programs for performance. With the emergence of RISC-V Vector Extensions (RVV), there is a need to migrate these performance legacy codes for RVV. Currently, the migration of NEON code to RVV code requires manual rewriting, which is a time-consuming and error-prone process. In this work, we u… ▽ More Many libraries, such as OpenCV, FFmpeg, XNNPACK, and Eigen, utilize Arm or x86 SIMD Intrinsics to optimize programs for performance. With the emergence of RISC-V Vector Extensions (RVV), there is a need to migrate these performance legacy codes for RVV. Currently, the migration of NEON code to RVV code requires manual rewriting, which is a time-consuming and error-prone process. In this work, we use the open source tool, "SIMD Everywhere" (SIMDe), to automate the migration. Our primary task is to enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions. For type conversion, we devise strategies to convert Neon Intrinsics types to RVV Intrinsics by considering the vector length agnostic (vla) architectures. With function conversions, we analyze commonly used conversion methods in SIMDe and develop customized conversions for each function based on the results of RVV code generations. In our experiments with Google XNNPACK library, our enhanced SIMDe achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe, which does not utilize customized RVV implementations for the conversions. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.03291 [pdf, other]

CLEANing Cygnus A deep and fast with R2D2

Authors: Arwa Dabbech, Amir Aghabiglou, Chung San Chu, Yves Wiaux

Abstract: A novel deep learning paradigm for synthesis imaging by radio interferometry in astronomy was recently proposed, dubbed "Residual-to-Residual DNN series for high-Dynamic range imaging" (R2D2). In this work, we start by shedding light on R2D2's algorithmic structure, interpreting it as a learned version of CLEAN with minor cycles substituted with a deep neural network (DNN) whose training is iterat… ▽ More A novel deep learning paradigm for synthesis imaging by radio interferometry in astronomy was recently proposed, dubbed "Residual-to-Residual DNN series for high-Dynamic range imaging" (R2D2). In this work, we start by shedding light on R2D2's algorithmic structure, interpreting it as a learned version of CLEAN with minor cycles substituted with a deep neural network (DNN) whose training is iteration-specific. We then proceed with R2D2's first demonstration on real data, for monochromatic intensity imaging of the radio galaxy Cygnus A from S band observations with the Very Large Array (VLA). We show that the modeling power of R2D2's learning approach enables delivering high-precision imaging, superseding the resolution of CLEAN, and matching the precision of modern optimization and plug-and-play algorithms, respectively uSARA and AIRI. Requiring few major-cycle iterations only, R2D2 provides a much faster reconstruction than uSARA and AIRI, known to be highly iterative, and is at least as fast as CLEAN. △ Less

Submitted 23 April, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: accepted for publication in ApJL

arXiv:2308.13795 [pdf, other]

VIDES: Virtual Interior Design via Natural Language and Visual Guidance

Authors: Minh-Hien Le, Chi-Bien Chu, Khanh-Duy Le, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

Abstract: Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts… ▽ More Interior design is crucial in creating aesthetically pleasing and functional indoor spaces. However, develo** and editing interior design concepts requires significant time and expertise. We propose Virtual Interior DESign (VIDES) system in response to this challenge. Leveraging cutting-edge technology in generative AI, our system can assist users in generating and editing indoor scene concepts quickly, given user text description and visual guidance. Using both visual guidance and language as the conditional inputs significantly enhances the accuracy and coherence of the generated scenes, resulting in visually appealing designs. Through extensive experimentation, we demonstrate the effectiveness of VIDES in develo** new indoor concepts, changing indoor styles, and replacing and removing interior objects. The system successfully captures the essence of users' descriptions while providing flexibility for customization. Consequently, this system can potentially reduce the entry barrier for indoor design, making it more accessible to users with limited technical skills and reducing the time required to create high-quality images. Individuals who have a background in design can now easily communicate their ideas visually and effectively present their design concepts. https://sites.google.com/view/ltnghia/research/VIDES △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepted to ISMAR 2023 (Poster paper)

arXiv:2308.01941 [pdf]

doi 10.34133/icomputing.0055

Digital twin brain: a bridge between biological intelligence and artificial intelligence

Authors: Hui Xiong, Congying Chu, Lingzhong Fan, Ming Song, Jiaqi Zhang, Yawei Ma, Ruonan Zheng, Junyang Zhang, Zhengyi Yang, Tianzi Jiang

Abstract: In recent years, advances in neuroscience and artificial intelligence have paved the way for unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. Cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights… ▽ More In recent years, advances in neuroscience and artificial intelligence have paved the way for unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. Cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights the importance of network architecture. Now is the time to bring them together to better unravel how intelligence emerges from the brain's multiscale repositories. In this review, we propose the Digital Twin Brain (DTB) as a transformative platform that bridges the gap between biological and artificial intelligence. It consists of three core elements: the brain structure that is fundamental to the twinning process, bottom-layer models to generate brain functions, and its wide spectrum of applications. Crucially, brain atlases provide a vital constraint, preserving the brain's network organization within the DTB. Furthermore, we highlight open questions that invite joint efforts from interdisciplinary fields and emphasize the far-reaching implications of the DTB. The DTB can offer unprecedented insights into the emergence of intelligence and neurological disorders, which holds tremendous promise for advancing our understanding of both biological and artificial intelligence, and ultimately propelling the development of artificial general intelligence and facilitating precision mental healthcare. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Journal ref: Intell Comput. 2023;2:0055

arXiv:2308.00085 [pdf, other]

Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation

Authors: Yahui Fu, Koji Inoue, Chenhui Chu, Tatsuya Kawahara

Abstract: Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user's experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user's perspective, ignoring the system's perspective. In this paper, we propose a commonsense-based causality expl… ▽ More Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user's experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user's perspective, ignoring the system's perspective. In this paper, we propose a commonsense-based causality explanation approach for diverse empathetic response generation that considers both the user's perspective (user's desires and reactions) and the system's perspective (system's intentions and reactions). We enhance ChatGPT's ability to reason for the system's perspective by integrating in-context learning with commonsense knowledge. Then, we integrate the commonsense-based causality explanation with both ChatGPT and a T5-based model. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations. △ Less

Submitted 5 September, 2023; v1 submitted 27 July, 2023; originally announced August 2023.

Comments: Accepted by the 24th Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL 2023)

arXiv:2307.16400 [pdf, other]

doi 10.1145/3610611

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Authors: Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, Eiichiro Sumita

Abstract: Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient as they require parallel corpora, days to train and hours to decode. This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to… ▽ More Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient as they require parallel corpora, days to train and hours to decode. This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora. SelfSeg takes as input a word in the form of a partially masked character sequence, optimizes the word generation probability and generates the segmentation with the maximum posterior probability, which is calculated using a dynamic programming algorithm. The training time of SelfSeg depends on word frequencies, and we explore several word frequency normalization strategies to accelerate the training phase. Additionally, we propose a regularization mechanism that allows the segmenter to generate various segmentations for one word. To show the effectiveness of our approach, we conduct MT experiments in low-, middle- and high-resource scenarios, where we compare the performance of using different segmentation methods. The experimental results demonstrate that on the low-resource ALT dataset, our method achieves more than 1.2 BLEU score improvement compared with BPE and SentencePiece, and a 1.1 score improvement over Dynamic Programming Encoding (DPE) and Vocabulary Learning via Optimal Transport (VOLT) on average. The regularization method achieves approximately a 4.3 BLEU score improvement over BPE and a 1.2 BLEU score improvement over BPE-dropout, the regularized version of BPE. We also observed significant improvements on IWSLT15 Vi->En, WMT16 Ro->En and WMT15 Fi->En datasets, and competitive results on the WMT14 De->En and WMT14 Fr->En datasets. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted to TALLIP journal

arXiv:2307.09529 [pdf, other]

doi 10.1109/QCE57702.2023.00124

QDoor: Exploiting Approximate Synthesis for Backdoor Attacks in Quantum Neural Networks

Authors: Cheng Chu, Fan Chen, Philip Richerme, Lei Jiang

Abstract: Quantum neural networks (QNNs) succeed in object recognition, natural language processing, and financial analysis. To maximize the accuracy of a QNN on a Noisy Intermediate Scale Quantum (NISQ) computer, approximate synthesis modifies the QNN circuit by reducing error-prone 2-qubit quantum gates. The success of QNNs motivates adversaries to attack QNNs via backdoors. However, naïvely transplanting… ▽ More Quantum neural networks (QNNs) succeed in object recognition, natural language processing, and financial analysis. To maximize the accuracy of a QNN on a Noisy Intermediate Scale Quantum (NISQ) computer, approximate synthesis modifies the QNN circuit by reducing error-prone 2-qubit quantum gates. The success of QNNs motivates adversaries to attack QNNs via backdoors. However, naïvely transplanting backdoors designed for classical neural networks to QNNs yields only low attack success rate, due to the noises and approximate synthesis on NISQ computers. Prior quantum circuit-based backdoors cannot selectively attack some inputs or work with all types of encoding layers of a QNN circuit. Moreover, it is easy to detect both transplanted and circuit-based backdoors in a QNN. In this paper, we propose a novel and stealthy backdoor attack, QDoor, to achieve high attack success rate in approximately-synthesized QNN circuits by weaponizing unitary differences between uncompiled QNNs and their synthesized counterparts. QDoor trains a QNN behaving normally for all inputs with and without a trigger. However, after approximate synthesis, the QNN circuit always predicts any inputs with a trigger to a predefined class while still acts normally for benign inputs. Compared to prior backdoor attacks, QDoor improves the attack success rate by $13\times$ and the clean data accuracy by $65\%$ on average. Furthermore, prior backdoor detection techniques cannot find QDoor attacks in uncompiled QNN circuits. △ Less

Submitted 16 February, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.07012 [pdf, other]

CryptoQFL: Quantum Federated Learning on Encrypted Data

Authors: Cheng Chu, Lei Jiang, Fan Chen

Abstract: Recent advancements in Quantum Neural Networks (QNNs) have demonstrated theoretical and experimental performance superior to their classical counterparts in a wide range of applications. However, existing centralized QNNs cannot solve many real-world problems because collecting large amounts of training data to a common public site is time-consuming and, more importantly, violates data privacy. Fe… ▽ More Recent advancements in Quantum Neural Networks (QNNs) have demonstrated theoretical and experimental performance superior to their classical counterparts in a wide range of applications. However, existing centralized QNNs cannot solve many real-world problems because collecting large amounts of training data to a common public site is time-consuming and, more importantly, violates data privacy. Federated Learning (FL) is an emerging distributed machine learning framework that allows collaborative model training on decentralized data residing on multiple devices without breaching data privacy. Some initial attempts at Quantum Federated Learning (QFL) either only focus on improving the QFL performance or rely on a trusted quantum server that fails to preserve data privacy. In this work, we propose CryptoQFL, a QFL framework that allows distributed QNN training on encrypted data. CryptoQFL is (1) secure, because it allows each edge to train a QNN with local private data, and encrypt its updates using quantum \homo~encryption before sending them to the central quantum server; (2) communication-efficient, as CryptoQFL quantize local gradient updates to ternary values, and only communicate non-zero values to the server for aggregation; and (3) computation-efficient, as CryptoQFL presents an efficient quantum aggregation circuit with significantly reduced latency compared to state-of-the-art approaches. △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.02736 [pdf]

An Uncertainty Aided Framework for Learning based Liver $T_1ρ$ Map** and Analysis

Authors: Chaoxing Huang, Vincent Wai Sun Wong, Queenie Chan, Winnie Chiu Wing Chu, Weitian Chen

Abstract: Objective: Quantitative $T_1ρ$ imaging has potential for assessment of biochemical alterations of liver pathologies. Deep learning methods have been employed to accelerate quantitative $T_1ρ$ imaging. To employ artificial intelligence-based quantitative imaging methods in complicated clinical environment, it is valuable to estimate the uncertainty of the predicated $T_1ρ$ values to provide the con… ▽ More Objective: Quantitative $T_1ρ$ imaging has potential for assessment of biochemical alterations of liver pathologies. Deep learning methods have been employed to accelerate quantitative $T_1ρ$ imaging. To employ artificial intelligence-based quantitative imaging methods in complicated clinical environment, it is valuable to estimate the uncertainty of the predicated $T_1ρ$ values to provide the confidence level of the quantification results. The uncertainty should also be utilized to aid the post-hoc quantitative analysis and model learning tasks. Approach: To address this need, we propose a parametric map refinement approach for learning-based $T_1ρ$ map** and train the model in a probabilistic way to model the uncertainty. We also propose to utilize the uncertainty map to spatially weight the training of an improved $T_1ρ$ map** network to further improve the map** performance and to remove pixels with unreliable $T_1ρ$ values in the region of interest. The framework was tested on a dataset of 51 patients with different liver fibrosis stages. Main results: Our results indicate that the learning-based map refinement method leads to a relative map** error of less than 3% and provides uncertainty estimation simultaneously. The estimated uncertainty reflects the actual error level, and it can be used to further reduce relative $T_1ρ$ map** error to 2.60% as well as removing unreliable pixels in the region of interest effectively. Significance: Our studies demonstrate the proposed approach has potential to provide a learning-based quantitative MRI system for trustworthy $T_1ρ$ map** of the liver. △ Less

Submitted 9 October, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.14105 [pdf, other]

Sequential Manipulation Planning for Over-actuated Unmanned Aerial Manipulators

Authors: Yao Su, Jiarui Li, Ziyuan Jiao, Meng Wang, Chi Chu, Hang Li, Yixin Zhu, Hangxin Liu

Abstract: We investigate the sequential manipulation planning problem for unmanned aerial manipulators (UAMs). Unlike prior work that primarily focuses on one-step manipulation tasks, sequential manipulations require coordinated motions of a UAM's floating base, the manipulator, and the object being manipulated, entailing a unified kinematics and dynamics model for motion planning under designated constrain… ▽ More We investigate the sequential manipulation planning problem for unmanned aerial manipulators (UAMs). Unlike prior work that primarily focuses on one-step manipulation tasks, sequential manipulations require coordinated motions of a UAM's floating base, the manipulator, and the object being manipulated, entailing a unified kinematics and dynamics model for motion planning under designated constraints. By leveraging a virtual kinematic chain (VKC)-based motion planning framework that consolidates components' kinematics into one chain, the sequential manipulation task of a UAM can be planned as a whole, yielding more coordinated motions. Integrating the kinematics and dynamics models with a hierarchical control framework, we demonstrate, for the first time, an over-actuated UAM achieves a series of new sequential manipulation capabilities in both simulation and experiment. △ Less

Submitted 10 July, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

Journal ref: IROS 2023

arXiv:2305.12733 [pdf, other]

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation

Authors: Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu

Abstract: Modeling multi-party conversations (MPCs) with graph neural networks has been proven effective at capturing complicated and graphical information flows. However, existing methods rely heavily on the necessary addressee labels and can only be applied to an ideal setting where each utterance must be tagged with an addressee label. To study the scarcity of addressee labels which is a common issue in… ▽ More Modeling multi-party conversations (MPCs) with graph neural networks has been proven effective at capturing complicated and graphical information flows. However, existing methods rely heavily on the necessary addressee labels and can only be applied to an ideal setting where each utterance must be tagged with an addressee label. To study the scarcity of addressee labels which is a common issue in MPCs, we propose MADNet that maximizes addressee deduction expectation in heterogeneous graph neural networks for MPC generation. Given an MPC with a few addressee labels missing, existing methods fail to build a consecutively connected conversation graph, but only a few separate conversation fragments instead. To ensure message passing between these conversation fragments, four additional types of latent edges are designed to complete a fully-connected graph. Besides, to optimize the edge-type-dependent message passing for those utterances without addressee labels, an Expectation-Maximization-based method that iteratively generates silver addressee labels (E step), and optimizes the quality of generated responses (M step), is designed. Experimental results on two Ubuntu IRC channel benchmarks show that MADNet outperforms various baseline models on the task of MPC generation, especially under the more common and challenging setting where part of addressee labels are missing. △ Less

Submitted 17 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted by EMNLP 2023. arXiv admin note: text overlap with arXiv:2203.08500

arXiv:2305.10190 [pdf, other]

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

Authors: Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

Abstract: The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and represe… ▽ More The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation. Neural interlingua representations have been shown as an effective method for achieving this. However, fixed-length neural interlingua representations introduced in previous work can limit its flexibility and representation ability. In this study, we introduce a novel method to enhance neural interlingua representations by making their length variable, thereby overcoming the constraint of fixed-length neural interlingua representations. Our empirical results on zero-shot translation on OPUS, IWSLT, and Europarl datasets demonstrate stable model convergence and superior zero-shot translation results compared to fixed-length neural interlingua representations. However, our analysis reveals the suboptimal efficacy of our approach in translating from certain source languages, wherein we pinpoint the defective model component in our proposed method. △ Less

Submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted to Multi3Generation workshop (held in conjunction with EAMT 2023)

arXiv:2305.09312 [pdf, other]

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

Authors: Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

Abstract: This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may ov… ▽ More This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023 main conference

arXiv:2305.09210 [pdf, other]

Towards Speech Dialogue Translation Mediating Speakers of Different Languages

Authors: Shuichiro Shimizu, Chenhui Chu, Sheng Li, Sadao Kurohashi

Abstract: We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech trans… ▽ More We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings. △ Less

Submitted 22 May, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures. Accepted to ACL 2023 Findings. Dataset: https://github.com/ku-nlp/speechBSD

arXiv:2304.09093 [pdf, other]

Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation

Authors: Huy Dao, Dung D. Le, Cuong Chu

Abstract: State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations to achieve high quality recommendations and responses generation. However, the representations of the items and words are usually modeled in two separated semantic spaces, which leads to misalignment issue between them. Consequently, this wil… ▽ More State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations to achieve high quality recommendations and responses generation. However, the representations of the items and words are usually modeled in two separated semantic spaces, which leads to misalignment issue between them. Consequently, this will cause the CRS to only achieve a sub-optimal ranking performance, especially when there is a lack of sufficient information from the user's input. To address limitations of previous works, we propose a new CRS framework KLEVER, which jointly models items and their associated contextual words in the same semantic space. Particularly, we construct an item descriptive graph from the rich items' textual features, such as item description and categories. Based on the constructed descriptive graph, KLEVER jointly learns the embeddings of the words and items, towards enhancing both recommender and dialog generation modules. Extensive experiments on benchmarking CRS dataset demonstrate that KLEVER achieves superior performance, especially when the information from the users' responses is lacking. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 14 pages, 3 figures, 9 tables

arXiv:2304.06662 [pdf, other]

Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions

Authors: Luyang Luo, Xi Wang, Yi Lin, Xiaoqi Ma, Andong Tan, Ronald Chan, Varut Vardhanabhuti, Winnie CW Chu, Kwang-Ting Cheng, Hao Chen

Abstract: Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex contex… ▽ More Breast cancer has reached the highest incidence rate worldwide among all malignancies since 2020. Breast imaging plays a significant role in early diagnosis and intervention to improve the outcome of breast cancer patients. In the past decade, deep learning has shown remarkable progress in breast cancer imaging analysis, holding great promise in interpreting the rich information and complex context of breast imaging modalities. Considering the rapid improvement in deep learning technology and the increasing severity of breast cancer, it is critical to summarize past progress and identify future challenges to be addressed. This paper provides an extensive review of deep learning-based breast cancer imaging research, covering studies on mammogram, ultrasound, magnetic resonance imaging, and digital pathology images over the past decade. The major deep learning methods and applications on imaging-based screening, diagnosis, treatment response prediction, and prognosis are elaborated and discussed. Drawn from the findings of this survey, we present a comprehensive discussion of the challenges and potential avenues for future research in deep learning-based breast cancer imaging. △ Less

Submitted 20 January, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: IEEE RBME 2024

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2303.05034 [pdf, other]

Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction

Authors: Caiyuan Chu, Ya Li, Yifan Liu, Jia-Chen Gu, Quan Liu, Yongxin Ge, Guo** Hu

Abstract: Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents… ▽ More Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents our solution to Track 2 of Intent Induction from Conversations for Task-Oriented Dialogue at the Eleventh Dialogue System Technology Challenge (DSTC11). The essence of intention clustering lies in distinguishing the representation of different dialogue utterances. The key to automatic intention induction is that, for any given set of new data, the sentence representation obtained by the model can be well distinguished from different labels. Therefore, we propose a multi-stage coarse-to-fine contrastive learning model training scheme including unsupervised contrastive learning pre-training, supervised contrastive learning pre-training, and fine-tuning with joint contrastive learning and clustering to obtain a better dialogue utterance representation model for the clustering task. In the released DSTC11 Track 2 evaluation results, our proposed system ranked first on both of the two subtasks of this Track. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Ranked 1st on Track 2 at DSTC 11, Accepted by DSTC 11 Workshop

arXiv:2302.08090 [pdf, other]

QTrojan: A Circuit Backdoor Against Quantum Neural Networks

Authors: Cheng Chu, Lei Jiang, Martin Swany, Fan Chen

Abstract: We propose a circuit-level backdoor attack, \textit{QTrojan}, against Quantum Neural Networks (QNNs) in this paper. QTrojan is implemented by few quantum gates inserted into the variational quantum circuit of the victim QNN. QTrojan is much stealthier than a prior Data-Poisoning-based Backdoor Attack (DPBA), since it does not embed any trigger in the inputs of the victim QNN or require the access… ▽ More We propose a circuit-level backdoor attack, \textit{QTrojan}, against Quantum Neural Networks (QNNs) in this paper. QTrojan is implemented by few quantum gates inserted into the variational quantum circuit of the victim QNN. QTrojan is much stealthier than a prior Data-Poisoning-based Backdoor Attack (DPBA), since it does not embed any trigger in the inputs of the victim QNN or require the access to original training datasets. Compared to a DPBA, QTrojan improves the clean data accuracy by 21\% and the attack success rate by 19.9\%. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Journal ref: ICASSP2023

arXiv:2212.01618 [pdf]

An Overview of Trust Standards for Communication Networks and Future Digital World

Authors: Huilin Wang, Xin Kang, Tieyan Li, Zhongding Lei, Cheng-Kang Chu, Haiguang Wang

Abstract: With the development of Information and Communication Technologies, trust has been applied more and more in various scenarios. At the same time, different organizations have published a series of trust frameworks to support the implementation of trust. There are also academic paper discussing about these trust standards, however, most of them only focus on a specific application. Unlike existing w… ▽ More With the development of Information and Communication Technologies, trust has been applied more and more in various scenarios. At the same time, different organizations have published a series of trust frameworks to support the implementation of trust. There are also academic paper discussing about these trust standards, however, most of them only focus on a specific application. Unlike existing works, this paper provides an overview of all current available trust standards related to communication networks and future digital world from several main organizations. To be specific, this paper summarizes and organizes all these trust standards into three layers: trust foundation, trust elements, and trust applications. We then analysis these trust standards and discuss their contribution in a systematic way. We discuss the motivations behind each current in forced standards, analyzes their frameworks and solutions, and presents their role and impact on communication works and future digital world. Finally, we give our suggestions on the trust work that needs to be standardized in future. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 7 pages, 3 figures, Magazine paper under review

arXiv:2212.00305 [pdf, other]

Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages

Authors: Tuan-Luc Huynh, Khoi-Nguyen Nguyen-Ngoc, Chi-Bien Chu, Minh-Triet Tran, Trung-Nghia Le

Abstract: According to the World Federation of the Deaf, more than two hundred sign languages exist. Therefore, it is challenging to understand deaf individuals, even proficient sign language users, resulting in a barrier between the deaf community and the rest of society. To bridge this language barrier, we propose a novel multilingual communication system, namely MUGCAT, to improve the communication effic… ▽ More According to the World Federation of the Deaf, more than two hundred sign languages exist. Therefore, it is challenging to understand deaf individuals, even proficient sign language users, resulting in a barrier between the deaf community and the rest of society. To bridge this language barrier, we propose a novel multilingual communication system, namely MUGCAT, to improve the communication efficiency of sign language users. By converting recognized specific hand gestures into expressive pictures, which is universal usage and language independence, our MUGCAT system significantly helps deaf people convey their thoughts. To overcome the limitation of sign language usage, which is mostly impossible to translate into complete sentences for ordinary people, we propose to reconstruct meaningful sentences from the incomplete translation of sign language. We also measure the semantic similarity of generated sentences with fragmented recognized hand gestures to keep the original meaning. Experimental results show that the proposed system can work in a real-time manner and synthesize exquisite stunning illustrations and meaningful sentences from a few hand gestures of sign language. This proves that our MUGCAT has promising potential in assisting deaf communication. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.03615 [pdf, other]

MAISON -- Multimodal AI-based Sensor platform for Older Individuals

Authors: Ali Abedi, Faranak Dayyani, Charlene Chu, Shehroz S. Khan

Abstract: There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop su… ▽ More There is a global aging population requiring the need for the right tools that can enable older adults' greater independence and the ability to age at home, as well as assist healthcare workers. It is feasible to achieve this objective by building predictive models that assist healthcare workers in monitoring and analyzing older adults' behavioral, functional, and psychological data. To develop such models, a large amount of multimodal sensor data is typically required. In this paper, we propose MAISON, a scalable cloud-based platform of commercially available smart devices capable of collecting desired multimodal sensor data from older adults and patients living in their own homes. The MAISON platform is novel due to its ability to collect a greater variety of data modalities than the existing platforms, as well as its new features that result in seamless data collection and ease of use for older adults who may not be digitally literate. We demonstrated the feasibility of the MAISON platform with two older adults discharged home from a large rehabilitation center. The results indicate that the MAISON platform was able to collect and store sensor data in a cloud without functional glitches or performance degradation. This paper will also discuss the challenges faced during the development of the platform and data collection in the homes of older adults. MAISON is a novel platform designed to collect multimodal data and facilitate the development of predictive models for detecting key health indicators, including social isolation, depression, and functional decline, and is feasible to use with older adults in the community. △ Less

Submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.01205 [pdf, other]

No-reference Point Cloud Geometry Quality Assessment Based on Pairwise Rank Learning

Authors: Zhiyong Su, Chao Chu, Long Chen, Yong Li, Weiqing Li

Abstract: Objective geometry quality assessment of point clouds is essential to evaluate the performance of a wide range of point cloud-based solutions, such as denoising, simplification, reconstruction, and watermarking. Existing point cloud quality assessment (PCQA) methods dedicate to assigning absolute quality scores to distorted point clouds. Their performance is strongly reliant on the quality and qua… ▽ More Objective geometry quality assessment of point clouds is essential to evaluate the performance of a wide range of point cloud-based solutions, such as denoising, simplification, reconstruction, and watermarking. Existing point cloud quality assessment (PCQA) methods dedicate to assigning absolute quality scores to distorted point clouds. Their performance is strongly reliant on the quality and quantity of subjective ground-truth scores for training, which are challenging to gather and have been shown to be imprecise, biased, and inconsistent. Furthermore, the majority of existing objective geometry quality assessment approaches are carried out by full-reference traditional metrics. So far, point-based no-reference geometry-only quality assessment techniques have not yet been investigated. This paper presents PRL-GQA, the first pairwise learning framework for no-reference geometry-only quality assessment of point clouds, to the best of our knowledge. The proposed PRL-GQA framework employs a siamese deep architecture, which takes as input a pair of point clouds and outputs their rank order. Each siamese architecture branch is a geometry quality assessment network (GQANet), which is designed to extract multi-scale quality-aware geometric features and output a quality index for the input point cloud. Then, based on the predicted quality indexes, a pairwise rank learning module is introduced to rank the relative quality of a pair of degraded point clouds.Extensive experiments demonstrate the effectiveness of the proposed PRL-GQA framework. Furthermore, the results also show that the fine-tuned no-reference GQANet performs competitively when compared to existing full-reference geometry quality assessment metrics. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2210.17291 [pdf]

SIX-Trust for 6G: Towards a Secure and Trustworthy 6G Network

Authors: Yiying Wang, Xin Kang, Tieyan Li, Haiguang Wang, Cheng-Kang Chu, Zhongding Lei

Abstract: Recent years have witnessed a digital explosion with the deployment of 5G and proliferation of 5G-enabled innovations. Compared with 5G, 6G is envisioned to achieve much higher performance in terms of latency, data rate, connectivity, energy efficiency, coverage and mobility. To fulfil these expectations, 6G will experience a number of paradigm shifts, such as exploiting new spectrum, applying ubi… ▽ More Recent years have witnessed a digital explosion with the deployment of 5G and proliferation of 5G-enabled innovations. Compared with 5G, 6G is envisioned to achieve much higher performance in terms of latency, data rate, connectivity, energy efficiency, coverage and mobility. To fulfil these expectations, 6G will experience a number of paradigm shifts, such as exploiting new spectrum, applying ubiquitous ML/AI technologies and building a space-air-ground-sea integrated network. However, these paradigm shifts may lead to numerous new security and privacy issues, which traditional security measures may not be able to deal with. To tackle these issues and build a trustworthy 6G network, we introduce a novel trust framework named as SIX-Trust, which composes of 3 layers: sustainable trust (S-Trust), infrastructure trust (I-Trust) and xenogenesis trust (X-Trust). Each layer plays a different role, and the importance of each layer varies for different application scenarios of 6G. For each layer, we briefly introduce its related enabling technologies, and demonstrate how these technologies can be applied to enhance trust and security of the 6G network. In general, SIX-Trust provides a holistic framework for defining and modeling trust of 6G, which can facilitate establishing a trustworthy 6G network. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Comments: 7 pages, 3 figures, under review

arXiv:2208.10758 [pdf, other]

Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Authors: Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara

Abstract: Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a commo… ▽ More Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conduct an exhaustive analysis based on hundreds of cross-experiments on 12 vision-and-language tasks categorized in 4 groups. Whereas tasks in the same group are prone to improve each other, results show that this is not always the case. Other factors such as dataset size or pre-training stage have also a great impact on how well the knowledge is transferred. △ Less

Submitted 23 August, 2022; originally announced August 2022.

arXiv:2208.04910 [pdf]

Deep Learning-Based Objective and Reproducible Osteosarcoma Chemotherapy Response Assessment and Outcome Prediction

Authors: David Joon Ho, Narasimhan P. Agaram, Marc-Henri Jean, Stephanie D. Suser, Cynthia Chu, Chad M. Vanderbilt, Paul A. Meyers, Leonard H. Wexler, John H. Healey, Thomas J. Fuchs, Meera R. Hameed

Abstract: Osteosarcoma is the most common primary bone cancer whose standard treatment includes pre-operative chemotherapy followed by resection. Chemotherapy response is used for predicting prognosis and further management of patients. Necrosis is routinely assessed post-chemotherapy from histology slides on resection specimens where necrosis ratio is defined as the ratio of necrotic tumor to overall tumor… ▽ More Osteosarcoma is the most common primary bone cancer whose standard treatment includes pre-operative chemotherapy followed by resection. Chemotherapy response is used for predicting prognosis and further management of patients. Necrosis is routinely assessed post-chemotherapy from histology slides on resection specimens where necrosis ratio is defined as the ratio of necrotic tumor to overall tumor. Patients with necrosis ratio >=90% are known to have better outcome. Manual microscopic review of necrosis ratio from multiple glass slides is semi-quantitative and can have intra- and inter-observer variability. We propose an objective and reproducible deep learning-based approach to estimate necrosis ratio with outcome prediction from scanned hematoxylin and eosin whole slide images. We collected 103 osteosarcoma cases with 3134 WSIs to train our deep learning model, to validate necrosis ratio assessment, and to evaluate outcome prediction. We trained Deep Multi-Magnification Network to segment multiple tissue subtypes including viable tumor and necrotic tumor in pixel-level and to calculate case-level necrosis ratio from multiple WSIs. We showed necrosis ratio estimated by our segmentation model highly correlates with necrosis ratio from pathology reports manually assessed by experts where mean absolute differences for Grades IV (100%), III (>=90%), and II (>=50% and <90%) necrosis response are 4.4%, 4.5%, and 17.8%, respectively. We successfully stratified patients to predict overall survival with p=10^-6 and progression-free survival with p=0.012. Our reproducible approach without variability enabled us to tune cutoff thresholds, specifically for our model and our data set, to 80% for OS and 60% for PFS. Our study indicates deep learning can support pathologists as an objective tool to analyze osteosarcoma from histology for assessing treatment response and predicting patient outcome. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Showing 1–50 of 131 results for author: Chu, C