-
Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach
Authors:
Cheng Su,
**bo Wen,
Jiawen Kang,
Yonghua Wang,
Hudan Pan,
M. Shamim Hossain
Abstract:
Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount…
▽ More
Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in develo** medical MLLMs, including healthcare data security and freshness issues, affecting the output quality of MLLMs. In this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLMs framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share fresh data, mitigating information asymmetry in data sharing. Finally, we utilize a generative diffusion model-based reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses
Authors:
Cheng Su,
Xiaofeng Luo,
Zhenmou Liu,
Jiawen Kang,
Min Hao,
Zehui Xiong,
Zhaohui Yang,
Chongwen Huang
Abstract:
The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t…
▽ More
The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline to opt for 3D avatars for social interactions. However, existing 3D avatars are typically generated through scanning the real faces of SMUs, which can raise concerns regarding information privacy and security, such as profile identity leakages. To tackle this, we introduce a new framework for personalized 3D avatar construction, leveraging a two-layer network model that provides SMUs with the option to customize their personal avatars for privacy preservation. Specifically, our approach introduces avatar pseudonyms to jointly safeguard the profile and digital identity privacy of the generated avatars. Then, we design a novel metric named Privacy of Personalized Avatars (PoPA), to evaluate effectiveness of the avatar pseudonyms. To optimize pseudonym resource, we model the pseudonym distribution process as a Stackelberg game and employ Deep Reinforcement Learning (DRL) to learn equilibrium strategies under incomplete information. Simulation results validate the efficacy and feasibility of our proposed schemes for mobile social metaverses.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Authors:
Chun-Hung Wu,
Shih-Hong Chen,
Chih-Yao Hu,
Hsin-Yu Wu,
Kai-Hsin Chen,
Yu-You Chen,
Chih-Hai Su,
Chih-Kuo Lee,
Yu-Lun Liu
Abstract:
This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronar…
▽ More
This paper presents Deformable Neural Vessel Representations (DeNVeR), an unsupervised approach for vessel segmentation in X-ray videos without annotated ground truth. DeNVeR uses optical flow and layer separation, enhancing segmentation accuracy and adaptability through test-time training. A key component of our research is the introduction of the XACV dataset, the first X-ray angiography coronary video dataset with high-quality, manually labeled segmentation ground truth. Our evaluation demonstrates that DeNVeR outperforms current state-of-the-art methods in vessel segmentation. This paper marks an advance in medical imaging, providing a robust, data-efficient tool for disease diagnosis and treatment planning and setting a new standard for future research in video vessel segmentation. See our project page for video results at https://kirito878.github.io/DeNVeR/.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Why Not Transform Chat Large Language Models to Non-English?
Authors:
Xiang Geng,
Ming Zhu,
Jiahuan Li,
Zhejian Lai,
Wei Zou,
Shuaijie She,
Jiaxin Guo,
Xiaofeng Zhao,
Yinglu Li,
Yuang Li,
Chang Su,
Yanqing Zhao,
Xinglin Lyu,
Min Zhang,
Jiajun Chen,
Hao Yang,
Shujian Huang
Abstract:
The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized fo…
▽ More
The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4.
△ Less
Submitted 31 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Joint Analysis of Single-Cell Data across Cohorts with Missing Modalities
Authors:
Marianne Arriola,
Weishen Pan,
Manqi Zhou,
Qiannan Zhang,
Chang Su,
Fei Wang
Abstract:
Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel f…
▽ More
Joint analysis of multi-omic single-cell data across cohorts has significantly enhanced the comprehensive analysis of cellular processes. However, most of the existing approaches for this purpose require access to samples with complete modality availability, which is impractical in many real-world scenarios. In this paper, we propose (Single-Cell Cross-Cohort Cross-Category) integration, a novel framework that learns unified cell representations under domain shift without requiring full-modality reference samples. Our generative approach learns rich cross-modal and cross-domain relationships that enable imputation of these missing modalities. Through experiments on real-world multi-omic datasets, we demonstrate that offers a robust solution to single-cell tasks such as cell type clustering, cell type classification, and feature imputation.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Seasonality Patterns in 311-Reported Foodborne Illness Cases and Machine Learning-Identified Indications of Foodborne Illnesses from Yelp Reviews, New York City, 2022-2023
Authors:
Eden Shaveet,
Crystal Su,
Daniel Hsu,
Luis Gravano
Abstract:
Restaurants are critical venues at which to investigate foodborne illness outbreaks due to shared sourcing, preparation, and distribution of foods. Formal channels to report illness after food consumption, such as 311, New York City's non-emergency municipal service platform, are underutilized. Given this, online social media platforms serve as abundant sources of user-generated content that provi…
▽ More
Restaurants are critical venues at which to investigate foodborne illness outbreaks due to shared sourcing, preparation, and distribution of foods. Formal channels to report illness after food consumption, such as 311, New York City's non-emergency municipal service platform, are underutilized. Given this, online social media platforms serve as abundant sources of user-generated content that provide critical insights into the needs of individuals and populations. We extracted restaurant reviews and metadata from Yelp to identify potential outbreaks of foodborne illness in connection with consuming food from restaurants. Because the prevalence of foodborne illnesses may increase in warmer months as higher temperatures breed more favorable conditions for bacterial growth, we aimed to identify seasonal patterns in foodborne illness reports from 311 and identify seasonal patterns of foodborne illness from Yelp reviews for New York City restaurants using a Hierarchical Sigmoid Attention Network (HSAN). We found no evidence of significant bivariate associations between any variables of interest. Given the inherent limitations of relying solely on user-generated data for public health insights, it is imperative to complement these sources with other data streams and insights from subject matter experts. Future investigations should involve conducting these analyses at more granular spatial and temporal scales to explore the presence of such differences or associations.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Rapid Mobile App Development for Generative AI Agents on MIT App Inventor
Authors:
Jaida Gao,
Calab Su,
Etai Miller,
Kevin Lu,
Yu Meng
Abstract:
The evolution of Artificial Intelligence (AI) stands as a pivotal force sha** our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications u…
▽ More
The evolution of Artificial Intelligence (AI) stands as a pivotal force sha** our society, finding applications across diverse domains such as education, sustainability, and safety. Leveraging AI within mobile applications makes it easily accessible to the public, catalyzing its transformative potential. In this paper, we present a methodology for the rapid development of AI agent applications using the development platform provided by MIT App Inventor. To demonstrate its efficacy, we share the development journey of three distinct mobile applications: SynchroNet for fostering sustainable communities; ProductiviTeams for addressing procrastination; and iHELP for enhancing community safety. All three applications seamlessly integrate a spectrum of generative AI features, leveraging OpenAI APIs. Furthermore, we offer insights gleaned from overcoming challenges in integrating diverse tools and AI functionalities, aiming to inspire young developers to join our efforts in building practical AI agent applications.
△ Less
Submitted 31 March, 2024;
originally announced May 2024.
-
SIGformer: Sign-aware Graph Transformer for Recommendation
Authors:
Sirui Chen,
Jiawei Chen,
Sheng Zhou,
Bohao Wang,
Shen Han,
Chanfei Su,
Yuqing Yuan,
Can Wang
Abstract:
In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos…
▽ More
In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process positive and negative feedback separately, which fails to holistically leverage the collaborative information within the signed graph; 2) They rely on MLPs or GNNs for information extraction from negative feedback, which may not be effective.
To overcome these limitations, we introduce SIGformer, a new method that employs the transformer architecture to sign-aware graph-based recommendation. SIGformer incorporates two innovative positional encodings that capture the spectral properties and path patterns of the signed graph, enabling the full exploitation of the entire graph. Our extensive experiments across five real-world datasets demonstrate the superiority of SIGformer over state-of-the-art methods. The code is available at https://github.com/StupidThree/SIGformer.
△ Less
Submitted 6 May, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
Authors:
Jooyoung Lee,
Fan Yang,
Thanh Tran,
Qian Hu,
Emre Barut,
Kai-Wei Chang,
Chengwei Su
Abstract:
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resour…
▽ More
We introduce a novel framework, LM-Guided CoT, that leverages a lightweight (i.e., <1B) language model (LM) for guiding a black-box large (i.e., >10B) LM in reasoning tasks. Specifically, the lightweight LM first generates a rationale for each input instance. The Frozen large LM is then prompted to predict a task output based on the rationale generated by the lightweight LM. Our approach is resource-efficient in the sense that it only requires training the lightweight LM. We optimize the model through 1) knowledge distillation and 2) reinforcement learning from rationale-oriented and task-oriented reward signals. We assess our method with multi-hop extractive question answering (QA) benchmarks, HotpotQA, and 2WikiMultiHopQA. Experimental results show that our approach outperforms all baselines regarding answer prediction accuracy. We also find that reinforcement learning helps the model to produce higher-quality rationales with improved QA performance.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
GDA: Generalized Diffusion for Robust Test-time Adaptation
Authors:
Yun-Yun Tsai,
Fu-Chen Chen,
Albert Y. C. Chen,
Junfeng Yang,
Che-Chun Su,
Min Sun,
Cheng-Hao Kuo
Abstract:
Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod…
▽ More
Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks.
△ Less
Submitted 2 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation
Authors:
Haofei Zhao,
Yilun Liu,
Shimin Tao,
Weibin Meng,
Yimeng Chen,
Xiang Geng,
Chang Su,
Min Zhang,
Hao Yang
Abstract:
Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodolog…
▽ More
Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of machine-translated text in real time without the need for reference translations, which is of great importance for the development of MT. After two decades of evolution, QE has yielded a wealth of results. This article provides a comprehensive overview of QE datasets, annotation methods, shared tasks, methodologies, challenges, and future research directions. It begins with an introduction to the background and significance of QE, followed by an explanation of the concepts and evaluation metrics for word-level QE, sentence-level QE, document-level QE, and explainable QE. The paper categorizes the methods developed throughout the history of QE into those based on handcrafted features, deep learning, and Large Language Models (LLMs), with a further division of deep learning-based methods into classic deep learning and those incorporating pre-trained language models (LMs). Additionally, the article details the advantages and limitations of each method and offers a straightforward comparison of different approaches. Finally, the paper discusses the current challenges in QE research and provides an outlook on future research directions.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
XPose: eXplainable Human Pose Estimation
Authors:
Luyu Qiu,
Jianing Li,
Lei Wen,
Chi Su,
Fei Hao,
Chen Jason Zhang,
Lei Chen
Abstract:
Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint…
▽ More
Current approaches in pose estimation primarily concentrate on enhancing model architectures, often overlooking the importance of comprehensively understanding the rationale behind model decisions. In this paper, we propose XPose, a novel framework that incorporates Explainable AI (XAI) principles into pose estimation. This integration aims to elucidate the individual contribution of each keypoint to final prediction, thereby elevating the model's transparency and interpretability. Conventional XAI techniques have predominantly addressed tasks with single-target tasks like classification. Additionally, the application of Shapley value, a common measure in XAI, to pose estimation has been hindered by prohibitive computational demands.
To address these challenges, this work introduces an innovative concept called Group Shapley Value (GSV). This approach strategically organizes keypoints into clusters based on their interdependencies. Within these clusters, GSV meticulously calculates Shapley value for keypoints, while for inter-cluster keypoints, it opts for a more holistic group-level valuation. This dual-level computation framework meticulously assesses keypoint contributions to the final outcome, optimizing computational efficiency. Building on the insights into keypoint interactions, we devise a novel data augmentation technique known as Group-based Keypoint Removal (GKR). This method ingeniously removes individual keypoints during training phases, deliberately preserving those with strong mutual connections, thereby refining the model's predictive prowess for non-visible keypoints. The empirical validation of GKR across a spectrum of standard approaches attests to its efficacy. GKR's success demonstrates how using Explainable AI (XAI) can directly enhance pose estimation models.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
The role of susceptible individuals in spreading dynamics
Authors:
Chang Su,
Fang Zhou,
Linyuan Lü
Abstract:
Exploring the internal mechanism of information spreading is critical for understanding and controlling the process. Traditional spreading models often assume individuals play the same role in the spreading process. In reality, however, individuals' diverse characteristics contribute differently to the spreading performance, leading to a heterogeneous infection rate across the system. To investiga…
▽ More
Exploring the internal mechanism of information spreading is critical for understanding and controlling the process. Traditional spreading models often assume individuals play the same role in the spreading process. In reality, however, individuals' diverse characteristics contribute differently to the spreading performance, leading to a heterogeneous infection rate across the system. To investigate network spreading dynamics under heterogeneous infection rates, we integrate two individual-level features -- influence (i.e., the ability to influence neighbors) and susceptibility (i.e., the extent to be influenced by neighbors) -- into the independent cascade model. Our findings reveal significant differences in spreading performance under heterogeneous and constant infection rates, with traditional structural centrality metrics proving more effective in the latter scenario. Additionally, we take the constant and heterogeneous infection rates into a state-of-the-art maximization algorithm, the well-known TIM algorithm, and find the seeds selected by heterogeneous infection rates are more dispersed compared to those under constant rates. Lastly, we find that both individuals' influence and susceptibility are vital to the spreading performance. Strikingly, susceptible individuals are particularly important to spreading when information is disseminated by social celebrities. By integrating influence and susceptibility into the spreading model, we gain a more profound understanding of the underlying mechanisms driving information spreading.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
Authors:
Zhongkai Hao,
Chang Su,
Songming Liu,
Julius Berner,
Chengyang Ying,
Hang Su,
Anima Anandkumar,
Jian Song,
Jun Zhu
Abstract:
Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training s…
▽ More
Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.
△ Less
Submitted 6 May, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Preconditioning for Physics-Informed Neural Networks
Authors:
Songming Liu,
Chang Su,
Jiachen Yao,
Zhongkai Hao,
Hang Su,
Youjia Wu,
Jun Zhu
Abstract:
Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by c…
▽ More
Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by classical numerical analysis, where the condition number measures sensitivity and stability, we highlight its pivotal role in the training dynamics of PINNs. We prove theorems to reveal how condition number is related to both the error control and convergence of PINNs. Subsequently, we present an algorithm that leverages preconditioning to improve the condition number. Evaluations of 18 PDE problems showcase the superior performance of our method. Significantly, in 7 of these problems, our method reduces errors by an order of magnitude. These empirical findings verify the critical role of the condition number in PINNs' training.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
UCorrect: An Unsupervised Framework for Automatic Speech Recognition Error Correction
Authors:
Jiaxin Guo,
Minghan Wang,
Xiaosong Qiao,
Daimeng Wei,
Hengchao Shang,
Zongyao Li,
Zhengzhe Yu,
Yinglu Li,
Chang Su,
Min Zhang,
Shimin Tao,
Hao Yang
Abstract:
Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tu…
▽ More
Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER). Previous works usually adopt end-to-end models and has strong dependency on Pseudo Paired Data and Original Paired Data. But when only pre-training on Pseudo Paired Data, previous models have negative effect on correction. While fine-tuning on Original Paired Data, the source side data must be transcribed by a well-trained ASR model, which takes a lot of time and not universal. In this paper, we propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction. UCorrect has no dependency on the training data mentioned before. The whole procedure is first to detect whether the character is erroneous, then to generate some candidate characters and finally to select the most confident one to replace the error character. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect for ASR error correction: 1) it achieves significant WER reduction, achieves 6.83\% even without fine-tuning and 14.29\% after fine-tuning; 2) it outperforms the popular NAR correction models by a large margin with a competitive low latency; and 3) it is an universal method, as it reduces all WERs of the ASR model with different decoding strategies and reduces all WERs of ASR models trained on different scale datasets.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Multi-dimensional Fair Federated Learning
Authors:
Cong Su,
Guoxian Yu,
Jun Wang,
Hui Li,
Qingzhong Li,
Han Yu
Abstract:
Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitab…
▽ More
Federated learning (FL) has emerged as a promising collaborative and secure paradigm for training a model from decentralized data without compromising privacy. Group fairness and client fairness are two dimensions of fairness that are important for FL. Standard FL can result in disproportionate disadvantages for certain clients, and it still faces the challenge of treating different groups equitably in a population. The problem of privately training fair FL models without compromising the generalization capability of disadvantaged clients remains open. In this paper, we propose a method, called mFairFL, to address this problem and achieve group fairness and client fairness simultaneously. mFairFL leverages differential multipliers to construct an optimization objective for empirical risk minimization with fairness constraints. Before aggregating locally trained models, it first detects conflicts among their gradients, and then iteratively curates the direction and magnitude of gradients to mitigate these conflicts. Theoretical analysis proves mFairFL facilitates the fairness in model development. The experimental evaluations based on three benchmark datasets show significant advantages of mFairFL compared to seven state-of-the-art baselines.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
TD-Net: A Tri-domain network for sparse-view CT reconstruction
Authors:
Xinyuan Wang,
Changqing Su,
Bo Xiong
Abstract:
Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that…
▽ More
Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that unifies sinogram, image, and frequency domain optimizations. By incorporating Frequency Supervision Module(FSM), TD-Net adeptly preserves intricate details, overcoming the prevalent over-smoothing issue. Extensive evaluations demonstrate TD-Net's superior performance in reconstructing high-quality CT images from sparse views, efficiently balancing radiation safety and image fidelity. The enhanced capabilities of TD-Net in varied noise scenarios highlight its potential as a breakthrough in medical imaging.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Knowledge From the Dark Side: Entropy-Reweighted Knowledge Distillation for Balanced Knowledge Transfer
Authors:
Chi-** Su,
Ching-Hsun Tseng,
Shin-Jye Lee
Abstract:
Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue t…
▽ More
Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions. Although existing KDs have shown the potential of transferring knowledge, the gap between the two parties still exists. With a series of investigations, we argue the gap is the result of the student's overconfidence in prediction, signaling an imbalanced focus on pronounced features while overlooking the subtle yet crucial dark knowledge. To overcome this, we introduce the Entropy-Reweighted Knowledge Distillation (ER-KD), a novel approach that leverages the entropy in the teacher's predictions to reweight the KD loss on a sample-wise basis. ER-KD precisely refocuses the student on challenging instances rich in the teacher's nuanced insights while reducing the emphasis on simpler cases, enabling a more balanced knowledge transfer. Consequently, ER-KD not only demonstrates compatibility with various state-of-the-art KD methods but also further enhances their performance at negligible cost. This approach offers a streamlined and effective strategy to refine the knowledge transfer process in KD, setting a new paradigm in the meticulous handling of dark knowledge. Our code is available at https://github.com/cpsu00/ER-KD.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
Authors:
Yilun Liu,
Shimin Tao,
Xiaofeng Zhao,
Ming Zhu,
Wenbing Ma,
Junhao Zhu,
Chang Su,
Yutai Hou,
Miao Zhang,
Min Zhang,
Hongxia Ma,
Li Zhang,
Hao Yang,
Yanfei Jiang
Abstract:
Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used for tuning greatly affects the performance of LLMs. However, the manual creation of high-quality instruction datasets is costly, leading to the adoption of automatic generation of instruction pairs by LLMs as a popular alternative. To ensure the high…
▽ More
Instruction tuning is crucial for enabling Language Learning Models (LLMs) in responding to human instructions. The quality of instruction pairs used for tuning greatly affects the performance of LLMs. However, the manual creation of high-quality instruction datasets is costly, leading to the adoption of automatic generation of instruction pairs by LLMs as a popular alternative. To ensure the high quality of LLM-generated instruction datasets, several approaches have been proposed. Nevertheless, existing methods either compromise dataset integrity by filtering a large proportion of samples, or are unsuitable for industrial applications. In this paper, instead of discarding low-quality samples, we propose CoachLM, a novel approach to enhance the quality of instruction datasets through automatic revisions on samples in the dataset. CoachLM is trained from the samples revised by human experts and significantly increases the proportion of high-quality samples in the dataset from 17.7% to 78.9%. The effectiveness of CoachLM is further assessed on various real-world instruction test sets. The results show that CoachLM improves the instruction-following capabilities of the instruction-tuned LLM by an average of 29.9%, which even surpasses larger LLMs with nearly twice the number of parameters. Furthermore, CoachLM is successfully deployed in a data management system for LLMs at Huawei, resulting in an efficiency improvement of up to 20% in the cleaning of 40k real-world instruction pairs. We release various assets of CoachLM, including the training data, code and test set (https://github.com/lunyiliu/CoachLM).
△ Less
Submitted 20 March, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
INeAT: Iterative Neural Adaptive Tomography
Authors:
Bo Xiong,
Changqing Su,
Zihan Lin,
You Zhou,
Zhaofei Yu
Abstract:
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditio…
▽ More
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditional methods. However, it still faces challenges when dealing with the substantial perturbations and pose shifts encountered in CT scanning processes. Here, we propose a neural rendering method for CT reconstruction, named Iterative Neural Adaptive Tomography (INeAT), which incorporates iterative posture optimization to effectively counteract the influence of posture perturbations in data, particularly in cases involving significant posture variations. Through the implementation of a posture feedback optimization strategy, INeAT iteratively refines the posture corresponding to the input images based on the reconstructed 3D volume. We demonstrate that INeAT achieves artifact-suppressed and resolution-enhanced reconstruction in scenarios with significant pose disturbances. Furthermore, we show that our INeAT maintains comparable reconstruction performance to stable-state acquisitions even using data from unstable-state acquisitions, which significantly reduces the time required for CT scanning and relaxes the stringent requirements on imaging hardware systems, underscoring its immense potential for applications in short-time and low-cost CT technology.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Application-layer Characterization and Traffic Analysis for Encrypted QUIC Transport Protocol
Authors:
Qianqian Zhang,
Chi-Jiun Su
Abstract:
Quick UDP Internet Connection (QUIC) is an emerging end-to-end encrypted, transport-layer protocol, which has been increasingly adopted by popular web services to improve communication security and quality of experience (QoE) towards end-users. However, this tendency makes the traffic analysis more challenging, given the limited information in the QUIC packet header and full encryption on the payl…
▽ More
Quick UDP Internet Connection (QUIC) is an emerging end-to-end encrypted, transport-layer protocol, which has been increasingly adopted by popular web services to improve communication security and quality of experience (QoE) towards end-users. However, this tendency makes the traffic analysis more challenging, given the limited information in the QUIC packet header and full encryption on the payload. To address this challenge, a novel rule-based approach is proposed to estimate the application-level traffic attributes without decrypting QUIC packets. Based on the size, timing, and direction information, our proposed algorithm analyzes the associated network traffic to infer the identity of each HTTP request and response pair, as well as the multiplexing feature in each QUIC connection. The inferred HTTP attributes can be used to evaluate the QoE of application-layer services and identify the service categories for traffic classification in the encrypted QUIC connections.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
Innovative Digital Storytelling with AIGC: Exploration and Discussion of Recent Advances
Authors:
Rongzhang Gu,
Hui Li,
Changyue Su,
Wayne Wu
Abstract:
Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration…
▽ More
Digital storytelling, as an art form, has struggled with cost-quality balance. The emergence of AI-generated Content (AIGC) is considered as a potential solution for efficient digital storytelling production. However, the specific form, effects, and impacts of this fusion remain unclear, leaving the boundaries of AIGC combined with storytelling undefined. This work explores the current integration state of AIGC and digital storytelling, investigates the artistic value of their fusion in a sample project, and addresses common issues through interviews. Through our study, we conclude that AIGC, while proficient in image creation, voiceover production, and music composition, falls short of replacing humans due to the irreplaceable elements of human creativity and aesthetic sensibilities at present, especially in complex character animations, facial expressions, and sound effects. The research objective is to increase public awareness of the current state, limitations, and challenges arising from combining AIGC and digital storytelling.
△ Less
Submitted 28 September, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
A Multitask Training Approach to Enhance Whisper with Contextual Biasing and Open-Vocabulary Keyword Spotting
Authors:
Yuang Li,
Min Zhang,
Chang Su,
Yinglu Li,
Xiaosong Qiao,
Mengxin Ren,
Miaomiao Ma,
Daimeng Wei,
Shimin Tao,
Hao Yang
Abstract:
The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS)…
▽ More
The recognition of rare named entities, such as personal names and terminologies, is challenging for automatic speech recognition (ASR) systems, especially when they are not frequently observed in the training data. In this paper, we introduce keyword spotting enhanced Whisper (KWS-Whisper), a novel ASR system that leverages the Whisper model and performs open-vocabulary keyword spotting (OV-KWS) on the hidden states of the Whisper encoder to recognize user-defined named entities. These entities serve as prompts for the Whisper decoder. To optimize the model, we propose a multitask training approach that learns OV-KWS and contextual-ASR tasks. We evaluate our approach on Chinese Aishell hot word subsets and two internal code-switching test sets and show that it significantly improves the entity recall compared to the original Whisper model. Moreover, we demonstrate that the OV-KWS can be a plug-and-play module to enhance the ASR error correction methods and frozen Whisper models.
△ Less
Submitted 6 June, 2024; v1 submitted 18 September, 2023;
originally announced September 2023.
-
Revisiting File Context for Source Code Summarization
Authors:
Aakash Bansal,
Chia-Yi Su,
Collin McMillan
Abstract:
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem wi…
▽ More
Source code summarization is the task of writing natural language descriptions of source code. A typical use case is generating short summaries of subroutines for use in API documentation. The heart of almost all current research into code summarization is the encoder-decoder neural architecture, and the encoder input is almost always a single subroutine or other short code snippet. The problem with this setup is that the information needed to describe the code is often not present in the code itself -- that information often resides in other nearby code. In this paper, we revisit the idea of ``file context'' for code summarization. File context is the idea of encoding select information from other subroutines in the same file. We propose a novel modification of the Transformer architecture that is purpose-built to encode file context and demonstrate its improvement over several baselines. We find that file context helps on a subset of challenging examples where traditional approaches struggle.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Distilled GPT for Source Code Summarization
Authors:
Chia-Yi Su,
Collin McMillan
Abstract:
A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Languag…
▽ More
A code summary is a brief natural language description of source code. Summaries are usually only a single sentence long, and yet form the backbone of developer documentation. A short descriptions such as "changes all visible polygons to the color blue" can give a programmer a high-level idea of what code does without the effort of reading the code itself. Recently, products based on Large Language Models such as ChatGPT have demonstrated a strong ability to write these descriptions automatically. However, to use these tools, programmers must send their code to untrusted third parties for processing (e.g., via an API call). This loss of custody is not acceptable to many organizations. In this paper, we present an alternative: we train an open source model using sample output generated by GPT-3.5 in a process related to knowledge distillation. Our model is small enough (350m parameters) to be run on a single 16gb GPU, yet we show in our evaluation that it is large enough to mimic GPT-3.5 on this task.
△ Less
Submitted 5 February, 2024; v1 submitted 28 August, 2023;
originally announced August 2023.
-
Modeling Programmer Attention as Scanpath Prediction
Authors:
Aakash Bansal,
Chia-Yi Su,
Zachary Karas,
Yifan Zhang,
Yu Huang,
Toby Jia-Jun Li,
Collin McMillan
Abstract:
This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfac…
▽ More
This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfaces, assistive technologies, and more human-like AI. For many years, researchers in SE have built these models based on features such as mouse clicks, key logging, and IDE interactions. Yet the holy grail in this area is scanpath prediction -- the prediction of the sequence of eye fixations a person would take over a visual stimulus. A person's eye movements are considered the most concrete evidence that a person is taking in a piece of information. Scanpath prediction is a notoriously difficult problem, but we believe that the emergence of lower-cost, higher-accuracy eye tracking equipment and better large language models of source code brings a solution within grasp. We present an eye tracking experiment with 27 programmers and a prototype scanpath predictor to present preliminary results and obtain early community feedback.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Semantic Similarity Loss for Neural Source Code Summarization
Authors:
Chia-Yi Su,
Collin McMillan
Abstract:
This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural network…
▽ More
This paper presents a procedure for and evaluation of using a semantic similarity metric as a loss function for neural source code summarization. Code summarization is the task of writing natural language descriptions of source code. Neural code summarization refers to automated techniques for generating these descriptions using neural networks. Almost all current approaches involve neural networks as either standalone models or as part of a pretrained large language models e.g., GPT, Codex, LLaMA. Yet almost all also use a categorical cross-entropy (CCE) loss function for network optimization. Two problems with CCE are that 1) it computes loss over each word prediction one-at-a-time, rather than evaluating a whole sentence, and 2) it requires a perfect prediction, leaving no room for partial credit for synonyms. In this paper, we extend our previous work on semantic similarity metrics to show a procedure for using semantic similarity as a loss function to alleviate this problem, and we evaluate this procedure in several settings in both metrics-driven and human studies. In essence, we propose to use a semantic similarity metric to calculate loss over the whole output sentence prediction per training batch, rather than just loss for each word. We also propose to combine our loss with CCE for each word, which streamlines the training process compared to baselines. We evaluate our approach over several baselines and report improvement in the vast majority of conditions.
△ Less
Submitted 11 June, 2024; v1 submitted 14 August, 2023;
originally announced August 2023.
-
Vision-Based Reactive Planning and Control of Quadruped Robots in Unstructured Dynamic Environments
Authors:
Tangyu Qian,
Zhangli Zhou,
Shaocheng Wang,
Zhijun Li,
Chun-Yi Su,
Zhen Kan
Abstract:
Quadruped robots have received increasing attention for the past few years. However, existing works primarily focus on static environments or assume the robot has full observations of the environment. This limits their practical applications since real-world environments are often dynamic and partially observable. To tackle these issues, vision-based reactive planning and control (V-RPC) is develo…
▽ More
Quadruped robots have received increasing attention for the past few years. However, existing works primarily focus on static environments or assume the robot has full observations of the environment. This limits their practical applications since real-world environments are often dynamic and partially observable. To tackle these issues, vision-based reactive planning and control (V-RPC) is developed in this work. The V-RPC comprises two modules: offline pre-planning and online reactive planning. The pre-planning phase generates a reference trajectory over continuous workspace via sampling-based methods using prior environmental knowledge, given an LTL specification. The online reactive module dynamically adjusts the reference trajectory and control based on the robot's real-time visual perception to adapt to environmental changes.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Authors:
Liangyu Zha,
Junlin Zhou,
Liyao Li,
Rui Wang,
Qingyi Huang,
Saisai Yang,
**g Yuan,
Changbao Su,
Xiang Li,
Aofeng Su,
Tao Zhang,
Chen Zhou,
Kaizhe Shou,
Miao Wang,
Wufang Zhu,
Guoshan Lu,
Chao Ye,
Yali Ye,
Wentao Ye,
Yiming Zhang,
Xinglong Deng,
Jie Xu,
Haobo Wang,
Gang Chen,
Junbo Zhao
Abstract:
Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operat…
▽ More
Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.
△ Less
Submitted 7 August, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs
Authors:
Zhongkai Hao,
Jiachen Yao,
Chang Su,
Hang Su,
Ziao Wang,
Fanzhi Lu,
Zeyu Xia,
Yichi Zhang,
Songming Liu,
Lu Lu,
Jun Zhu
Abstract:
While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat condu…
▽ More
While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.
△ Less
Submitted 5 October, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Creating Emordle: Animating Word Cloud for Emotion Expression
Authors:
Liwenhan Xie,
Xinhuan Shu,
Jeon Cheol Su,
Yun Wang,
Siming Chen,
Huamin Qu
Abstract:
We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multi…
▽ More
We propose emordle, a conceptual design that animates wordles (compact word clouds) to deliver their emotional context to the audiences. To inform the design, we first reviewed online examples of animated texts and animated wordles, and summarized strategies for injecting emotion into the animations. We introduced a composite approach that extends an existing animation scheme for one word to multiple words in a wordle with two global factors: the randomness of text animation (entropy) and the animation speed (speed). To create an emordle, general users can choose one predefined animated scheme that matches the intended emotion class and fine-tune the emotion intensity with the two parameters. We designed proof-of-concept emordle examples for four basic emotion classes, namely happiness, sadness, anger, and fear. We conducted two controlled crowdsourcing studies to evaluate our approach. The first study confirmed that people generally agreed on the conveyed emotions from well-crafted animations, and the second one demonstrated that our identified factors helped fine-tune the delivered emotion extent. We also invited general users to create emordles on their own based on our proposed framework. Through this user study, we confirmed the effectiveness of the approach. We concluded with implications for future research opportunities of supporting emotion expression in visualizations.
△ Less
Submitted 14 June, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks
Authors:
Jiachen Yao,
Chang Su,
Zhongkai Hao,
Songming Liu,
Hang Su,
Jun Zhu
Abstract:
Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we…
▽ More
Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. We also characterize the connections between the training loss and actual error, guaranteeing convergence under mild conditions. The theoretical analysis inspires us to further propose MultiAdam, a scale-invariant optimizer that leverages gradient momentum to parameter-wisely balance the loss terms. Extensive experiment results on multiple problems from different physical domains demonstrate that our MultiAdam solver can improve the predictive accuracy by 1-2 orders of magnitude compared with strong baselines.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator
Authors:
Xiaolong Chen,
Xin Qi,
Chunguang Su,
Yuan He,
Zhijun Wang,
Kunxiang Sun,
Chao **,
Weilong Chen,
Shuhui Liu,
Xiaoying Zhao,
Duanyang Jia,
Man Yi
Abstract:
The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated en…
▽ More
The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated environment and applied to the real accelerator directly with zero-shot. To validate the effectiveness of our method, two different typical beam control tasks were performed on China Accelerator Facility for Superheavy Elements (CAFe II) and a light particle injector(LPI) respectively. The orbit correction tasks were performed in three cryomodules in CAFe II seperately, the time required for tuning has been reduced to one-tenth of that needed by human experts, and the RMS values of the corrected orbit were all less than 1mm. The other transmission efficiency optimization task was conducted in the LPI, our agent successfully optimized the transmission efficiency of radio-frequency quadrupole(RFQ) to over $85\%$ within 2 minutes. The outcomes of these two experiments offer substantiation that our proposed TBSAC approach can efficiently and effectively accomplish beam commissioning tasks while upholding the same standard as skilled human experts. As such, our method exhibits potential for future applications in other accelerator commissioning fields.
△ Less
Submitted 25 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
A Language Model of Java Methods with Train/Test Deduplication
Authors:
Chia-Yi Su,
Aakash Bansal,
Vijayanta Jain,
Sepideh Ghanavati,
Collin McMillan
Abstract:
This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java. In contrast to many existing language models, we prioritize features for researchers including an open and easily-searchable training set, a held out test set with differen…
▽ More
This tool demonstration presents a research toolkit for a language model of Java source code. The target audience includes researchers studying problems at the granularity level of subroutines, statements, or variables in Java. In contrast to many existing language models, we prioritize features for researchers including an open and easily-searchable training set, a held out test set with different levels of deduplication from the training set, infrastructure for deduplicating new examples, and an implementation platform suitable for execution on equipment accessible to a relatively modest budget. Our model is a GPT2-like architecture with 350m parameters. Our training set includes 52m Java methods (9b tokens) and 13m StackOverflow threads (10.5b tokens). To improve accessibility of research to more members of the community, we limit local resource requirements to GPUs with 16GB video memory. We provide a test set of held out Java methods that include descriptive comments, including the entire Java projects for those methods. We also provide deduplication tools using precomputed hash tables at various similarity thresholds to help researchers ensure that their own test examples are not in the training set. We make all our tools and data open source and available via Huggingface and Github.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
Fairness-aware Differentially Private Collaborative Filtering
Authors:
Zhenhuan Yang,
Yingqiang Ge,
Congzhe Su,
Dingxian Wang,
Xiaoting Zhao,
Yiming Ying
Abstract:
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochas…
▽ More
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochastic gradient descent (DP-SGD), results in a disparate impact on user groups with respect to different user engagement levels. This, in turn, causes the original unfair model to become even more biased against inactive users. To address the above issues, we propose \textbf{DP-Fair}, a two-stage framework for collaborative filtering based algorithms. Specifically, it combines differential privacy mechanisms with fairness constraints to protect user privacy while ensuring fair recommendations. The experimental results, based on Amazon datasets, and user history logs collected from Etsy, one of the largest e-commerce platforms, demonstrate that our proposed method exhibits superior performance in terms of both overall accuracy and user group fairness on both shallow and deep recommendation models compared to vanilla DP-SGD.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Authors:
Xinyue Hu,
Lin Gu,
Kazuma Kobayashi,
Qiyuan An,
Qingyu Chen,
Zhiyong Lu,
Chang Su,
Tatsuya Harada,
Yingying Zhu
Abstract:
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual…
▽ More
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images. This technique has the potential to improve the efficiency of medical professionals while relieving the burden on the public health system, particularly in resource-poor countries. Existing medical VQA methods tend to encode medical images and learn the correspondence between visual features and questions without exploiting the spatial, semantic, or medical knowledge behind them. This is partially because of the small size of the current medical VQA dataset, which often includes simple questions. Therefore, we first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images. The questions involved detailed relationships, such as disease names, locations, levels, and types in our dataset. Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs: spatial relationship, semantic relationship, and implicit relationship graphs on the image regions, questions, and semantic labels. The answer and graph reasoning paths are learned for different questions.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Certified Robustness of Quantum Classifiers against Adversarial Examples through Quantum Noise
Authors:
Jhih-Cing Huang,
Yu-Lin Tsai,
Chao-Han Huck Yang,
Cheng-Fang Su,
Chia-Mu Yu,
Pin-Yu Chen,
Sy-Yen Kuo
Abstract:
Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification. In this paper, we propose the first theoretical study demonstrating that adding quantum random rotation noise can improve robustness in quantum classifiers against adversarial attacks. We link the definition of diffe…
▽ More
Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification. In this paper, we propose the first theoretical study demonstrating that adding quantum random rotation noise can improve robustness in quantum classifiers against adversarial attacks. We link the definition of differential privacy and show that the quantum classifier trained with the natural presence of additive noise is differentially private. Finally, we derive a certified robustness bound to enable quantum classifiers to defend against adversarial examples, supported by experimental results simulated with noises from IBM's 7-qubits device.
△ Less
Submitted 28 April, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Application of Deep Learning on Single-Cell RNA-sequencing Data Analysis: A Review
Authors:
Matthew Brendel,
Chang Su,
Zilong Bai,
Hao Zhang,
Olivier Elemento,
Fei Wang
Abstract:
Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during development of complex organisms and improved our understanding o…
▽ More
Single-cell RNA-sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during development of complex organisms and improved our understanding of disease states, such as cancer, diabetes, and COVID, among others. Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative, compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analysis tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep algorithms for scRNA-seq data analysis.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
AutoDeconJ: a GPU accelerated ImageJ plugin for 3D light field deconvolution with optimal iteration numbers predicting
Authors:
C. Q. Su,
Y. H Gao,
Y Zhou,
Y. Q Sun,
C. G Yan,
H. B Yin,
B Xiong
Abstract:
Light field microscopy is a compact solution to high-speed 3D fluorescence imaging. Usually, we need to do 3D deconvolution to the captured raw data. Although there are deep neural network methods that can accelerate the reconstruction process, the model is not universally applicable for all system parameters. Here, we develop AutoDeconJ, a GPU accelerated ImageJ plugin for 4.4x faster and accurat…
▽ More
Light field microscopy is a compact solution to high-speed 3D fluorescence imaging. Usually, we need to do 3D deconvolution to the captured raw data. Although there are deep neural network methods that can accelerate the reconstruction process, the model is not universally applicable for all system parameters. Here, we develop AutoDeconJ, a GPU accelerated ImageJ plugin for 4.4x faster and accurate deconvolution of light field microscopy data. We further propose an image quality metric for the deconvolution process, aiding in automatically determining the optimal number of iterations with higher reconstruction accuracy and fewer artifacts
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
Memory Efficient Temporal & Visual Graph Model for Unsupervised Video Domain Adaptation
Authors:
Xinyue Hu,
Lin Gu,
Liangchen Liu,
Ruijiang Li,
Chang Su,
Tatsuya Harada,
Yingying Zhu
Abstract:
Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos, which are memory cost expensive and can't scale up to long videos. To address these limitations, we propose a memory-efficient graph-based video DA approach as follows. At first our method models each source or target video by a graph: nodes represent video fram…
▽ More
Existing video domain adaption (DA) methods need to store all temporal combinations of video frames or pair the source and target videos, which are memory cost expensive and can't scale up to long videos. To address these limitations, we propose a memory-efficient graph-based video DA approach as follows. At first our method models each source or target video by a graph: nodes represent video frames and edges represent the temporal or visual similarity relationship between frames. We use a graph attention network to learn the weight of individual frames and simultaneously align the source and target video into a domain-invariant graph feature space. Instead of storing a large number of sub-videos, our method only constructs one graph with a graph attention mechanism for one video, reducing the memory cost substantially. The extensive experiments show that, compared with the state-of-art methods, we achieved superior performance while reducing the memory cost significantly.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.
-
Probing Simile Knowledge from Pre-trained Language Models
Authors:
Weijie Chen,
Yongzhu Chang,
Rongsheng Zhang,
Jiashu Pu,
Guandan Chen,
Le Zhang,
Yadong Xi,
Yijiang Chen,
Chang Su
Abstract:
Simile interpretation (SI) and simile generation (SG) are challenging tasks for NLP because models require adequate world knowledge to produce predictions. Previous works have employed many hand-crafted resources to bring knowledge-related into models, which is time-consuming and labor-intensive. In recent years, pre-trained language models (PLMs) based approaches have become the de-facto standard…
▽ More
Simile interpretation (SI) and simile generation (SG) are challenging tasks for NLP because models require adequate world knowledge to produce predictions. Previous works have employed many hand-crafted resources to bring knowledge-related into models, which is time-consuming and labor-intensive. In recent years, pre-trained language models (PLMs) based approaches have become the de-facto standard in NLP since they learn generic knowledge from a large corpus. The knowledge embedded in PLMs may be useful for SI and SG tasks. Nevertheless, there are few works to explore it. In this paper, we probe simile knowledge from PLMs to solve the SI and SG tasks in the unified framework of simile triple completion for the first time. The backbone of our framework is to construct masked sentences with manual patterns and then predict the candidate words in the masked position. In this framework, we adopt a secondary training process (Adjective-Noun mask Training) with the masked language model (MLM) loss to enhance the prediction diversity of candidate words in the masked position. Moreover, pattern ensemble (PE) and pattern search (PS) are applied to improve the quality of predicted words. Finally, automatic and human evaluations demonstrate the effectiveness of our framework in both SI and SG tasks.
△ Less
Submitted 27 April, 2022;
originally announced April 2022.
-
Mechanism, measurement, and quantification of stress in decision process: a model based systematic-review protocol
Authors:
Chang Su,
Xiaoyuan Li,
Lin Yang,
Yong Zeng
Abstract:
Every human action begins with decision-making. Stress is a significant source of biases that can influence human decision-making. In order to understand the relationship between stress and decision-making, stress quantification is fundamental. Different methods of measuring and quantifying stress in decision-making have been described in the literature while an up-to-date systematic review of the…
▽ More
Every human action begins with decision-making. Stress is a significant source of biases that can influence human decision-making. In order to understand the relationship between stress and decision-making, stress quantification is fundamental. Different methods of measuring and quantifying stress in decision-making have been described in the literature while an up-to-date systematic review of the existing methods is lacking. Moreover, mental stress, mental effort, cognitive workload, and workload are often used interchangeably but should be distinguished to enable in-depth investigations of decision-mechanisms. Our objectives are to clarify stress related concepts and review the measurement, quantification, and application of stress during decision making activities.
△ Less
Submitted 19 March, 2022;
originally announced March 2022.
-
Joint-training on Symbiosis Networks for Deep Nueral Machine Translation models
Authors:
Zhengzhe Yu,
Jiaxin Guo,
Minghan Wang,
Daimeng Wei,
Hengchao Shang,
Zongyao Li,
Zhanglin Wu,
Yuxia Wang,
Yimeng Chen,
Chang Su,
Min Zhang,
Lizhi Lei,
shimin tao,
Hao Yang
Abstract:
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Netw…
▽ More
Deep encoders have been proven to be effective in improving neural machine translation (NMT) systems, but it reaches the upper bound of translation quality when the number of encoder layers exceeds 18. Worse still, deeper networks consume a lot of memory, making it impossible to train efficiently. In this paper, we present Symbiosis Networks, which include a full network as the Symbiosis Main Network (M-Net) and another shared sub-network with the same structure but less layers as the Symbiotic Sub Network (S-Net). We adopt Symbiosis Networks on Transformer-deep (m-n) architecture and define a particular regularization loss $\mathcal{L}_τ$ between the M-Net and S-Net in NMT. We apply joint-training on the Symbiosis Networks and aim to improve the M-Net performance. Our proposed training strategy improves Transformer-deep (12-6) by 0.61, 0.49 and 0.69 BLEU over the baselines under classic training on WMT'14 EN->DE, DE->EN and EN->FR tasks. Furthermore, our Transformer-deep (12-6) even outperforms classic Transformer-deep (18-6).
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation
Authors:
Jiaxin Guo,
Minghan Wang,
Daimeng Wei,
Hengchao Shang,
Yuxia Wang,
Zongyao Li,
Zhengzhe Yu,
Zhanglin Wu,
Yimeng Chen,
Chang Su,
Min Zhang,
Lizhi Lei,
shimin tao,
Hao Yang
Abstract:
Recently, non-autoregressive (NAT) models predict outputs in parallel, achieving substantial improvements in generation speed compared to autoregressive (AT) models. While performing worse on raw data, most NAT models are trained as student models on distilled data generated by AT teacher models, which is known as sequence-level Knowledge Distillation. An effective training strategy to improve the…
▽ More
Recently, non-autoregressive (NAT) models predict outputs in parallel, achieving substantial improvements in generation speed compared to autoregressive (AT) models. While performing worse on raw data, most NAT models are trained as student models on distilled data generated by AT teacher models, which is known as sequence-level Knowledge Distillation. An effective training strategy to improve the performance of AT models is Self-Distillation Mixup (SDM) Training, which pre-trains a model on raw data, generates distilled data by the pre-trained model itself and finally re-trains a model on the combination of raw data and distilled data. In this work, we aim to view SDM for NAT models, but find directly adopting SDM to NAT models gains no improvements in terms of translation quality. Through careful analysis, we observe the invalidation is correlated to Modeling Diversity and Confirmation Bias between the AT teacher model and the NAT student models. Based on these findings, we propose an enhanced strategy named SDMRT by adding two stages to classic SDM: one is Pre-Rerank on self-distilled data, the other is Fine-Tune on Filtered teacher-distilled data. Our results outperform baselines by 0.6 to 1.2 BLEU on multiple NAT models. As another bonus, for Iterative Refinement NAT models, our methods can outperform baselines within half iteration number, which means 2X acceleration.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Diformer: Directional Transformer for Neural Machine Translation
Authors:
Minghan Wang,
Jiaxin Guo,
Yuxia Wang,
Daimeng Wei,
Hengchao Shang,
Chang Su,
Yimeng Chen,
Yinglu Li,
Min Zhang,
Shimin Tao,
Hao Yang
Abstract:
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful to the performance due to the g…
▽ More
Autoregressive (AR) and Non-autoregressive (NAR) models have their own superiority on the performance and latency, combining them into one model may take advantage of both. Current combination frameworks focus more on the integration of multiple decoding paradigms with a unified generative model, e.g. Masked Language Model. However, the generalization can be harmful to the performance due to the gap between training objective and inference. In this paper, we aim to close the gap by preserving the original objective of AR and NAR under a unified framework. Specifically, we propose the Directional Transformer (Diformer) by jointly modelling AR and NAR into three generation directions (left-to-right, right-to-left and straight) with a newly introduced direction variable, which works by controlling the prediction of each token to have specific dependencies under that direction. The unification achieved by direction successfully preserves the original dependency assumption used in AR and NAR, retaining both generalization and performance. Experiments on 4 WMT benchmarks demonstrate that Diformer outperforms current united-modelling works with more than 1.5 BLEU points for both AR and NAR decoding, and is also competitive to the state-of-the-art independent AR and NAR models.
△ Less
Submitted 30 December, 2021; v1 submitted 21 December, 2021;
originally announced December 2021.
-
General Greedy De-bias Learning
Authors:
Xinzhe Han,
Shuhui Wang,
Chi Su,
Qingming Huang,
Qi Tian
Abstract:
Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios. Others implicitly identify the dataset…
▽ More
Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by special design low capability biased models or losses, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model. The base model is encouraged to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
△ Less
Submitted 18 January, 2023; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Roof-Transformer: Divided and Joined Understanding with Knowledge Enhancement
Authors:
Wei-Lin Liao,
Cheng-En Su,
Wei-Yun Ma
Abstract:
Recent work on enhancing BERT-based language representation models with knowledge graphs (KGs) and knowledge bases (KBs) has yielded promising results on multiple NLP tasks. State-of-the-art approaches typically integrate the original input sentences with KG triples and feed the combined representation into a BERT model. However, as the sequence length of a BERT model is limited, such a framework…
▽ More
Recent work on enhancing BERT-based language representation models with knowledge graphs (KGs) and knowledge bases (KBs) has yielded promising results on multiple NLP tasks. State-of-the-art approaches typically integrate the original input sentences with KG triples and feed the combined representation into a BERT model. However, as the sequence length of a BERT model is limited, such a framework supports little knowledge other than the original input sentences and is thus forced to discard some knowledge. This problem is especially severe for downstream tasks for which the input is a long paragraph or even a document, such as QA or reading comprehension tasks. We address this problem with Roof-Transformer, a model with two underlying BERTs and a fusion layer on top. One underlying BERT encodes the knowledge resources and the other one encodes the original input sentences, and the fusion layer integrates the two resultant encodings. Experimental results on a QA task and the GLUE benchmark attest the effectiveness of the proposed model.
△ Less
Submitted 20 October, 2022; v1 submitted 13 December, 2021;
originally announced December 2021.
-
CoMP Enhanced Subcarrier and Power Allocation for Multi-Numerology based 5G-NR Networks
Authors:
Li-Hsiang Shen,
Chia-Yu Su,
Kai-Ten Feng
Abstract:
With proliferation of fifth generation (5G) new radio (NR) technology, it is expected to meet the requirement of diverse traffic demands. We have designed a coordinated multi-point (CoMP) enhanced flexible multi-numerology (MN) for 5G-NR networks to improve the network performance in terms of throughput and latency. We have proposed a CoMP enhanced joint subcarrier and power allocation (CESP) sche…
▽ More
With proliferation of fifth generation (5G) new radio (NR) technology, it is expected to meet the requirement of diverse traffic demands. We have designed a coordinated multi-point (CoMP) enhanced flexible multi-numerology (MN) for 5G-NR networks to improve the network performance in terms of throughput and latency. We have proposed a CoMP enhanced joint subcarrier and power allocation (CESP) scheme which aims at maximizing sum rate under the considerations of transmit power limitation and guaranteed quality-of-service (QoS) including throughput and latency restrictions. By employing difference of two concave functions (D.C.) approximation and abstract Lagrangian duality method, we theoretically transform the original non-convex nonlinear problem into a solvable maximization problem. Moreover, the convergence of our proposed CESP algorithm with D.C. approximation is analytically derived with proofs, and is further validated via numerical results. Simulation results demonstrated that our proposed CESP algorithm outperforms the conventional non-CoMP and single numerology mechanisms along with other existing benchmarks in terms of lower latency and higher throughput under the scenarios of uniform and edge users.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis
Authors:
Zhaobo Qi,
Shuhui Wang,
Chi Su,
Li Su,
Weigang Zhang,
Qingming Huang
Abstract:
Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are act…
▽ More
Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are actually described by complex interaction among different concepts, while their behaviors vary drastically from one video to another, leading to the difficulty in concept-based analytics for accurate event categorization. To model the concept behavior, we study temporal concept receptive field of concept-based event representation, which encodes the temporal occurrence pattern of different mid-level concepts. Accordingly, we introduce temporal dynamic convolution (TDC) to give stronger flexibility to concept-based event analytics. TDC can adjust the temporal concept receptive field size dynamically according to different inputs. Notably, a set of coefficients are learned to fuse the results of multiple convolutions with different kernel widths that provide various temporal concept receptive field sizes. Different coefficients can generate appropriate and accurate temporal concept receptive field size according to input videos and highlight crucial concepts. Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis. Experiment results on FCVID and ActivityNet show that TDCMN demonstrates adaptive event recognition ability conditioned on different inputs, and improve the event recognition performance of Concept-based methods by a large margin. Code is available at https://github.com/qzhb/TDCMN.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.