Search | arXiv e-print repository

Reward Steering with Evolutionary Heuristics for Decoding-time Alignment

Authors: Chia-Yu Hung, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

Abstract: The widespread applicability and increasing omnipresence of LLMs have instigated a need to align LLM responses to user and stakeholder preferences. Many preference optimization approaches have been proposed that fine-tune LLM parameters to achieve good alignment. However, such parameter tuning is known to interfere with model performance on many tasks. Moreover, kee** up with shifting user prefe… ▽ More The widespread applicability and increasing omnipresence of LLMs have instigated a need to align LLM responses to user and stakeholder preferences. Many preference optimization approaches have been proposed that fine-tune LLM parameters to achieve good alignment. However, such parameter tuning is known to interfere with model performance on many tasks. Moreover, kee** up with shifting user preferences is tricky in such a situation. Decoding-time alignment with reward model guidance solves these issues at the cost of increased inference time. However, most of such methods fail to strike the right balance between exploration and exploitation of reward -- often due to the conflated formulation of these two aspects - to give well-aligned responses. To remedy this we decouple these two aspects and implement them in an evolutionary fashion: exploration is enforced by decoding from mutated instructions and exploitation is represented as the periodic replacement of poorly-rewarded generations with well-rewarded ones. Empirical evidences indicate that this strategy outperforms many preference optimization and decode-time alignment approaches on two widely accepted alignment benchmarks AlpacaEval 2 and MT-Bench. Our implementation will be available at: https://darwin-alignment.github.io. △ Less

Submitted 25 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.16450 [pdf, other]

Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search

Authors: Max Liu, Chan-Hung Yu, Wei-Hsu Lee, Cheng-Wei Hung, Yen-Chun Chen, Shao-Hua Sun

Abstract: Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search fra… ▽ More Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to consistently improve the programs. Experimental results in the Karel domain demonstrate the superior effectiveness and efficiency of our LLM-GS framework. Extensive ablation studies further verify the critical role of our Pythonic-DSL strategy and Scheduled Hill Climbing algorithm. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2404.09956 [pdf, other]

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

Abstract: Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models… ▽ More Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life. The generation of audio from text prompts is an important aspect of such processes in the music and film industry. Many of the recent diffusion-based text-to-audio models focus on training increasingly sophisticated diffusion models on a large set of datasets of prompt-audio pairs. These models do not explicitly focus on the presence of concepts or events and their temporal ordering in the output audio with respect to the input prompt. Our hypothesis is focusing on how these aspects of audio generation could improve audio generation performance in the presence of limited data. As such, in this work, using an existing text-to-audio model Tango, we synthetically create a preference dataset where each prompt has a winner audio output and some loser audio outputs for the diffusion model to learn from. The loser outputs, in theory, have some concepts from the prompt missing or in an incorrect order. We fine-tune the publicly available Tango text-to-audio model using diffusion-DPO (direct preference optimization) loss on our preference dataset and show that it leads to improved audio output over Tango and AudioLDM2, in terms of both automatic- and manual-evaluation metrics. △ Less

Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: https://github.com/declare-lab/tango

arXiv:2404.08820 [pdf]

Single-image driven 3d viewpoint training data augmentation for effective wine label recognition

Authors: Yueh-Cheng Huang, Hsin-Yi Chen, Cheng-Jui Hung, Jen-Hui Chuang, Jenq-Neng Hwang

Abstract: Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges pose… ▽ More Confronting the critical challenge of insufficient training data in the field of complex image recognition, this paper introduces a novel 3D viewpoint augmentation technique specifically tailored for wine label recognition. This method enhances deep learning model performance by generating visually realistic training samples from a single real-world wine label image, overcoming the challenges posed by the intricate combinations of text and logos. Classical Generative Adversarial Network (GAN) methods fall short in synthesizing such intricate content combination. Our proposed solution leverages time-tested computer vision and image processing strategies to expand our training dataset, thereby broadening the range of training samples for deep learning applications. This innovative approach to data augmentation circumvents the constraints of limited training resources. Using the augmented training images through batch-all triplet metric learning on a Vision Transformer (ViT) architecture, we can get the most discriminative embedding features for every wine label, enabling us to perform one-shot recognition of existing wine labels in the training classes or future newly collected wine labels unavailable in the training. Experimental results show a significant increase in recognition accuracy over conventional 2D data augmentation techniques. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2403.15759 [pdf]

Deep Learning Approach to Forecasting COVID-19 Cases in Residential Buildings of Hong Kong Public Housing Estates: The Role of Environment and Sociodemographics

Authors: E. Leung, J. Guan, KO. Kwok, CT. Hung, CC. Ching, KC. Chong, CHK. Yam, T. Sun, WH. Tsang, EK. Yeoh, A. Lee

Abstract: Introduction: The current study investigates the complex association between COVID-19 and the studied districts' socioecology (e.g. internal and external built environment, sociodemographic profiles, etc.) to quantify their contributions to the early outbreaks and epidemic resurgence of COVID-19. Methods: We aligned the analytic model's architecture with the hierarchical structure of the resident'… ▽ More Introduction: The current study investigates the complex association between COVID-19 and the studied districts' socioecology (e.g. internal and external built environment, sociodemographic profiles, etc.) to quantify their contributions to the early outbreaks and epidemic resurgence of COVID-19. Methods: We aligned the analytic model's architecture with the hierarchical structure of the resident's socioecology using a multi-headed hierarchical convolutional neural network to structure the vast array of hierarchically related predictive features representing buildings' internal and external built environments and residents' sociodemographic profiles as model input. COVID-19 cases accumulated in buildings across three adjacent districts in HK, both before and during HK's epidemic resurgence, were modeled. A forward-chaining validation was performed to examine the model's performance in forecasting COVID-19 cases over the 3-, 7-, and 14-day horizons during the two months subsequent to when the model for COVID-19 resurgence was built to align with the forecasting needs in an evolving pandemic. Results: Different sets of factors were found to be linked to the earlier waves of COVID-19 outbreaks compared to the epidemic resurgence of the pandemic. Sociodemographic factors such as work hours, monthly household income, employment types, and the number of non-working adults or children in household populations were of high importance to the studied buildings' COVID-19 case counts during the early waves of COVID-19. Factors constituting one's internal built environment, such as the number of distinct households in the buildings, the number of distinct households per floor, and the number of floors, corridors, and lifts, had the greatest unique contributions to the building-level COVID-19 case counts during epidemic resurgence. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.13842 [pdf]

Analyzing the Variations in Emergency Department Boarding and Testing the Transferability of Forecasting Models across COVID-19 Pandemic Waves in Hong Kong: Hybrid CNN-LSTM approach to quantifying building-level socioecological risk

Authors: Eman Leung, **g**g Guan, Kin On Kwok, CT Hung, CC. Ching, CK. Chung, Hector Tsang, EK Yeoh, Albert Lee

Abstract: Emergency department's (ED) boarding (defined as ED waiting time greater than four hours) has been linked to poor patient outcomes and health system performance. Yet, effective forecasting models is rare before COVID-19, lacking during the peri-COVID era. Here, a hybrid convolutional neural network (CNN)-Long short-term memory (LSTM) model was applied to public-domain data sourced from Hong Kong's… ▽ More Emergency department's (ED) boarding (defined as ED waiting time greater than four hours) has been linked to poor patient outcomes and health system performance. Yet, effective forecasting models is rare before COVID-19, lacking during the peri-COVID era. Here, a hybrid convolutional neural network (CNN)-Long short-term memory (LSTM) model was applied to public-domain data sourced from Hong Kong's Hospital Authority, Department of Health, and Housing Authority. In addition, we sought to identify the phase of the COVID-19 pandemic that most significantly perturbed our complex adaptive healthcare system, thereby revealing a stable pattern of interconnectedness among its components, using deep transfer learning methodology. Our result shows that 1) the greatest proportion of days with ED boarding was found between waves four and five; 2) the best-performing model for forecasting ED boarding was observed between waves four and five, which was based on features representing time-invariant residential buildings' built environment and sociodemographic profiles and the historical time series of ED boarding and case counts, compared to during the waves when best-performing forecasting is based on time-series features alone; and 3) when the model built from the period between waves four and five was applied to data from other waves via deep transfer learning, the transferred model enhanced the performance of indigenous models. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.18883 [pdf, other]

Efficient Processing of Subsequent Densest Subgraph Query

Authors: Chia-Yang Hung, Chih-Ya Shen

Abstract: Dense subgraph extraction is a fundamental problem in graph analysis and data mining, aimed at identifying cohesive and densely connected substructures within a given graph. It plays a crucial role in various domains, including social network analysis, biological network analysis, recommendation systems, and community detection. However, extracting a subgraph with the highest node similarity is a… ▽ More Dense subgraph extraction is a fundamental problem in graph analysis and data mining, aimed at identifying cohesive and densely connected substructures within a given graph. It plays a crucial role in various domains, including social network analysis, biological network analysis, recommendation systems, and community detection. However, extracting a subgraph with the highest node similarity is a lack of exploration. To address this problem, we studied the Member Selection Problem and extended it with a dynamic constraint variant. By incorporating dynamic constraints, our algorithm can adapt to changing conditions or requirements, allowing for more flexible and personalized subgraph extraction. This approach enables the algorithm to provide tailored solutions that meet specific needs, even in scenarios where constraints may vary over time. We also provide the theoretical analysis to show that our algorithm is 1/3-approximation. Eventually, the experiments show that our algorithm is effective and efficient in tackling the member selection problem with dynamic constraints. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: 11 pages

MSC Class: 68W27

arXiv:2401.15554 [pdf]

Pericoronary adipose tissue feature analysis in CT calcium score images with comparison to coronary CTA

Authors: Yingnan Song, Hao Wu, Juhwan Lee, Justin Kim, Ammar Hoori, Tao Hu, Vladislav Zimin, Mohamed Makhlouf, Sadeer Al-Kindi, Sanjay Rajagopalan, Chun-Ho Yun, Chung-Lieh Hung, David L. Wilson

Abstract: We investigated the feasibility and advantages of using non-contrast CT calcium score (CTCS) images to assess pericoronary adipose tissue (PCAT) and its association with major adverse cardiovascular events (MACE). PCAT features from coronary CTA (CCTA) have been shown to be associated with cardiovascular risk but are potentially confounded by iodine. If PCAT in CTCS images can be similarly analyze… ▽ More We investigated the feasibility and advantages of using non-contrast CT calcium score (CTCS) images to assess pericoronary adipose tissue (PCAT) and its association with major adverse cardiovascular events (MACE). PCAT features from coronary CTA (CCTA) have been shown to be associated with cardiovascular risk but are potentially confounded by iodine. If PCAT in CTCS images can be similarly analyzed, it would avoid this issue and enable its inclusion in formal risk assessment from readily available, low-cost CTCS images. To identify coronaries in CTCS images that have subtle visual evidence of vessels, we registered CTCS with paired CCTA images having coronary labels. We developed a novel axial-disk method giving regions for analyzing PCAT features in three main coronary arteries. We analyzed novel hand-crafted and radiomic features using univariate and multivariate logistic regression prediction of MACE and compared results against those from CCTA. Registration accuracy was sufficient to enable the identification of PCAT regions in CTCS images. Motion or beam hardening artifacts were often present in high-contrast CCTA but not CTCS. Mean HU and volume were increased in both CTCS and CCTA for MACE group. There were significant positive correlations between some CTCS and CCTA features, suggesting that similar characteristics were obtained. Using hand-crafted/radiomics from CTCS and CCTA, AUCs were 0.82/0.79 and 0.83/0.77 respectively, while Agatston gave AUC=0.73. Preliminarily, PCAT features can be assessed from three main coronary arteries in non-contrast CTCS images with performance characteristics that are at the very least comparable to CCTA. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 24 pages,10 figures

arXiv:2401.11095 [pdf, other]

doi 10.1145/3643834.3661556

SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Authors: Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, Anhong Guo

Abstract: Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum… ▽ More Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum posts within the blind community, identifying prevailing challenges, needs, and desired solutions. We synthesized the results and propose SoundShift for increasing MR sound awareness, which includes six sound manipulations: Transparency Shift, Envelope Shift, Position Shift, Style Shift, Time Shift, and Sound Append. To evaluate the effectiveness of SoundShift, we conducted a user study with 18 blind participants across three simulated MR scenarios, where participants identified specific sounds within intricate soundscapes. We found that SoundShift increased MR sound awareness and minimized cognitive load. Finally, we developed three real-world example applications to demonstrate the practicality of SoundShift. △ Less

Submitted 26 May, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: DIS 2024

arXiv:2311.14966 [pdf, other]

Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains

Authors: Chia-Chien Hung, Wiem Ben Rim, Lindsay Frost, Lars Bruckner, Carolin Lawrence

Abstract: High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), such as ChatGPT and its variants, their performance in high-risk domains remains unclear. Our study delves into an in-depth analysis of the performance of instruction-tuned LLMs, focusing on factual accuracy and safety adherence. T… ▽ More High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), such as ChatGPT and its variants, their performance in high-risk domains remains unclear. Our study delves into an in-depth analysis of the performance of instruction-tuned LLMs, focusing on factual accuracy and safety adherence. To comprehensively assess the capabilities of LLMs, we conduct experiments on six NLP datasets including question answering and summarization tasks within two high-risk domains: legal and medical. Further qualitative analysis highlights the existing limitations inherent in current LLMs when evaluating in high-risk domains. This underscores the essential nature of not only improving LLM capabilities but also prioritizing the refinement of domain-specific metrics, and embracing a more human-centric approach to enhance safety and factual reliability. Our findings advance the field toward the concerns of properly evaluating LLMs in high-risk domains, aiming to steer the adaptability of LLMs in fulfilling societal obligations and aligning with forthcoming regulations, such as the EU AI Act. △ Less

Submitted 25 November, 2023; originally announced November 2023.

Comments: EMNLP 2023 Workshop on Benchmarking Generalisation in NLP (GenBench)

arXiv:2311.07993 [pdf, other]

Explicit Change Relation Learning for Change Detection in VHR Remote Sensing Images

Authors: Dalong Zheng, Zebin Wu, Jia Liu, Chih-Cheng Hung, Zhihui Wei

Abstract: Change detection has always been a concerned task in the interpretation of remote sensing images. It is essentially a unique binary classification task with two inputs, and there is a change relationship between these two inputs. At present, the mining of change relationship features is usually implicit in the network architectures that contain single-branch or two-branch encoders. However, due to… ▽ More Change detection has always been a concerned task in the interpretation of remote sensing images. It is essentially a unique binary classification task with two inputs, and there is a change relationship between these two inputs. At present, the mining of change relationship features is usually implicit in the network architectures that contain single-branch or two-branch encoders. However, due to the lack of artificial prior design for change relationship features, these networks cannot learn enough change semantic information and lose more accurate change detection performance. So we propose a network architecture NAME for the explicit mining of change relation features. In our opinion, the change features of change detection should be divided into pre-changed image features, post-changed image features and change relation features. In order to fully mine these three kinds of change features, we propose the triple branch network combining the transformer and convolutional neural network (CNN) to extract and fuse these change features from two perspectives of global information and local information, respectively. In addition, we design the continuous change relation (CCR) branch to further obtain the continuous and detail change relation features to improve the change discrimination capability of the model. The experimental results show that our network performs better, in terms of F1, IoU, and OA, than those of the existing advanced networks for change detection on four public very high-resolution (VHR) remote sensing datasets. Our source code is available at https://github.com/DalongZ/NAME. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.14909 [pdf, other]

Linking Surface Facts to Large-Scale Knowledge Graphs

Authors: Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš

Abstract: Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other… ▽ More Open Information Extraction (OIE) methods extract facts from natural language text in the form of ("subject"; "relation"; "object") triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage; e.g., the surface phrase "Michael Jordan" may refer to either the former basketball player or the university professor. Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entities and predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task. We publicly release all resources (data, benchmark and code) on https://github.com/nec-research/fact-linking. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.08123 [pdf, other]

Who Wrote it and Why? Prompting Large-Language Models for Authorship Verification

Authors: Chia-Yu Hung, Zhiqiang Hu, Yujia Hu, Roy Ka-Wei Lee

Abstract: Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address thes… ▽ More Authorship verification (AV) is a fundamental task in natural language processing (NLP) and computational linguistics, with applications in forensic analysis, plagiarism detection, and identification of deceptive content. Existing AV techniques, including traditional stylometric and deep learning approaches, face limitations in terms of data requirements and lack of explainability. To address these limitations, this paper proposes PromptAV, a novel technique that leverages Large-Language Models (LLMs) for AV by providing step-by-step stylometric explanation prompts. PromptAV outperforms state-of-the-art baselines, operates effectively with limited training data, and enhances interpretability through intuitive explanations, showcasing its potential as an effective and interpretable solution for the AV task. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 7 pages,1 figure

arXiv:2309.09658 [pdf]

A Novel Method of Fuzzy Topic Modeling based on Transformer Processing

Authors: Ching-Hsun Tseng, Shin-Jye Lee, Po-Wei Cheng, Chien Lee, Chih-Chieh Hung

Abstract: Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit… ▽ More Topic modeling is admittedly a convenient way to monitor markets trend. Conventionally, Latent Dirichlet Allocation, LDA, is considered a must-do model to gain this type of information. By given the merit of deducing keyword with token conditional probability in LDA, we can know the most possible or essential topic. However, the results are not intuitive because the given topics cannot wholly fit human knowledge. LDA offers the first possible relevant keywords, which also brings out another problem of whether the connection is reliable based on the statistic possibility. It is also hard to decide the topic number manually in advance. As the booming trend of using fuzzy membership to cluster and using transformers to embed words, this work presents the fuzzy topic modeling based on soft clustering and document embedding from state-of-the-art transformer-based model. In our practical application in a press release monitoring, the fuzzy topic modeling gives a more natural result than the traditional output from LDA. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Asian Journal of Information and Communications, Vol.12, No. 1, 125-140

arXiv:2306.15593 [pdf]

Cardiac CT perfusion imaging of pericoronary adipose tissue (PCAT) highlights potential confounds in coronary CTA

Authors: Hao Wu, Yingnan Song, Ammar Hoori, Ananya Subramaniam, Juhwan Lee, Justin Kim, Tao Hu, Sadeer Al-Kindi, Wei-Ming Huang, Chun-Ho Yun, Chung-Lieh Hung, Sanjay Rajagopalan, David L. Wilson

Abstract: Features of pericoronary adipose tissue (PCAT) assessed from coronary computed tomography angiography (CCTA) are associated with inflammation and cardiovascular risk. As PCAT is vascularly connected with coronary vasculature, the presence of iodine is a potential confounding factor on PCAT HU and textures that has not been adequately investigated. Use dynamic cardiac CT perfusion (CCTP) to inform… ▽ More Features of pericoronary adipose tissue (PCAT) assessed from coronary computed tomography angiography (CCTA) are associated with inflammation and cardiovascular risk. As PCAT is vascularly connected with coronary vasculature, the presence of iodine is a potential confounding factor on PCAT HU and textures that has not been adequately investigated. Use dynamic cardiac CT perfusion (CCTP) to inform contrast determinants of PCAT assessment. From CCTP, we analyzed HU dynamics of territory-specific PCAT, myocardium, and other adipose depots in patients with coronary artery disease. HU, blood flow, and radiomics were assessed over time. Changes from peak aorta time, Pa, chosen to model the time of CCTA, were obtained. HU in PCAT increased more than in other adipose depots. The estimated blood flow in PCAT was ~23% of that in the contiguous myocardium. Comparing PCAT distal and proximal to a significant stenosis, we found less enhancement and longer time-to-peak distally. Two-second offsets [before, after] Pa resulted in [ 4-HU, 3-HU] differences in PCAT. Due to changes in HU, the apparent PCAT volume reduced ~15% from the first scan (P1) to Pa using a conventional fat window. Comparing radiomic features over time, 78% of features changed >10% relative to P1. CCTP elucidates blood flow in PCAT and enables analysis of PCAT features over time. PCAT assessments (HU, apparent volume, and radiomics) are sensitive to acquisition timing and the presence of obstructive stenosis, which may confound the interpretation of PCAT in CCTA images. Data normalization may be in order. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 13 pages, 8 figures

arXiv:2305.12717 [pdf, other]

TADA: Efficient Task-Agnostic Domain Adaptation for Transformers

Authors: Chia-Chien Hung, Lukas Lange, Jannik Strötgen

Abstract: Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviated from full domain-adaptive pre-training, approaches such as adapters have been developed. However, these require additional parameters for each layer, and are criticized for their limited… ▽ More Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviated from full domain-adaptive pre-training, approaches such as adapters have been developed. However, these require additional parameters for each layer, and are criticized for their limited expressiveness. In this work, we introduce TADA, a novel task-agnostic domain adaptation method which is modular, parameter-efficient, and thus, data-efficient. Within TADA, we retrain the embeddings to learn domain-aware input representations and tokenizers for the transformer encoder, while freezing all other parameters of the model. Then, task-specific fine-tuning is performed. We further conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases. Our broad evaluation in 4 downstream tasks for 14 domains across single- and multi-domain setups and high- and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: ACL-Findings 2023

arXiv:2305.05139

Temporal Convolution Network Based Onset Detection and Query by Humming System Design

Authors: Yu Cheng Hung, Jian-Jiun Ding

Abstract: Onsets are a key factor to split audio into several notes. In this paper, we ensemble multiple temporal convolution network (TCN) based model and utilize a restricted frequency range spectrogram to achieve more robust onset detection. Different from the present onset detection of QBH system which is only available in a clean scenario, our proposal of onset detection and speech enhancement can prev… ▽ More Onsets are a key factor to split audio into several notes. In this paper, we ensemble multiple temporal convolution network (TCN) based model and utilize a restricted frequency range spectrogram to achieve more robust onset detection. Different from the present onset detection of QBH system which is only available in a clean scenario, our proposal of onset detection and speech enhancement can prevent noise from affecting onset detection function (ODF). Compared to the CNN model which exploits spatial features of the spectrogram, the TCN model exploits both spatial and temporal features of the spectrogram. As the usage of QBH in noisy scenarios, we apply the TCN-based speech enhancement as a preprocessor of QBH. With the combinations of TCN-based speech enhancement and onset detection, simulations show that the proposal can enable the QBH system in both noisy and clean circumstances with short response time. △ Less

Submitted 7 June, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: This paper has been withdrawn by the author due to a crucial definition of probability threshold and several grammer and vocabulary mistakes

arXiv:2305.03982 [pdf]

Pitch Estimation by Denoising Preprocessor and Hybrid Estimation Model

Authors: Yu Cheng Hung, ** Hung Chen, Jian Jiun Ding

Abstract: Pitch estimation is to estimate the fundamental frequency and the midi number and plays a critical role in music signal analysis and vocal signal processing. In this work, we proposed a new architecture based on a learning-based enhancement preprocessor and a combination of several traditional and deep learning pitch estimation methods to achieve better pitch estimation performance in both noisy a… ▽ More Pitch estimation is to estimate the fundamental frequency and the midi number and plays a critical role in music signal analysis and vocal signal processing. In this work, we proposed a new architecture based on a learning-based enhancement preprocessor and a combination of several traditional and deep learning pitch estimation methods to achieve better pitch estimation performance in both noisy and clean scenarios. We test 17 different types of noise and 4 SNRdb noise levels. The results show that the proposed pitch estimation can perform better in both noisy and clean scenarios with short response time. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: From ICCE-Taiwan

arXiv:2303.17388 [pdf, other]

BPCE: A Prototype for Co-Evolution between Business Process Variants through Configurable Process Model

Authors: Linyue Liu, Xi Guo, Chun Ouyang, Patrick C. K. Hung, Hong-Yu Zhang, Keqing He, Chen Mo, Zaiwen Feng

Abstract: With the continuous development of business process management technology, the increasing business process models are usually owned by large enterprises. In large enterprises, different stakeholders may modify the same business process model. In order to better manage the changeability of processes, they adopt configurable business process models to manage process variants. However, the process va… ▽ More With the continuous development of business process management technology, the increasing business process models are usually owned by large enterprises. In large enterprises, different stakeholders may modify the same business process model. In order to better manage the changeability of processes, they adopt configurable business process models to manage process variants. However, the process variants will vary with the change in enterprise business demands. Therefore, it is necessary to explore the co-evolution of the process variants so as to effectively manage the business process family. To this end, a novel framework for co-evolution between business process variants through a configurable process model is proposed in this work. First, the map** relationship between process variants and configurable models is standardized in this study. A series of change operations and change propagation operations between process variants and configurable models are further defined for achieving propagation. Then, an overall algorithm is proposed for achieving co-evolution of process variants. Next, a prototype is developed for managing change synchronization between process variants and configurable process models. Finally, the effectiveness and efficiency of our proposed process change propagation method are verified based on experiments on two business process datasets. The experimental results show that our approach implements the co-evolution of process variants with high accuracy and efficiency. △ Less

Submitted 30 March, 2023; originally announced March 2023.

Comments: 18 pages , 11 figures

MSC Class: 68N99 ACM Class: D.2.2

arXiv:2303.03364 [pdf, other]

Leveraging Scene Embeddings for Gradient-Based Motion Planning in Latent Space

Authors: Jun Yamada, Chia-Man Hung, Jack Collins, Ioannis Havoutis, Ingmar Posner

Abstract: Motion planning framed as optimisation in structured latent spaces has recently emerged as competitive with traditional methods in terms of planning success while significantly outperforming them in terms of computational speed. However, the real-world applicability of recent work in this domain remains limited by the need to express obstacle information directly in state-space, involving simple g… ▽ More Motion planning framed as optimisation in structured latent spaces has recently emerged as competitive with traditional methods in terms of planning success while significantly outperforming them in terms of computational speed. However, the real-world applicability of recent work in this domain remains limited by the need to express obstacle information directly in state-space, involving simple geometric primitives. In this work we address this challenge by leveraging learned scene embeddings together with a generative model of the robot manipulator to drive the optimisation process. In addition, we introduce an approach for efficient collision checking which directly regularises the optimisation undertaken for planning. Using simulated as well as real-world experiments, we demonstrate that our approach, AMP-LS, is able to successfully plan in novel, complex scenes while outperforming traditional planning baselines in terms of computation speed by an order of magnitude. We show that the resulting system is fast enough to enable closed-loop planning in real-world dynamic scenes. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Project website: https://amp-ls.github.io/

Journal ref: IEEE International Conference on Robotics and Automation (ICRA), 2023

arXiv:2211.14986 [pdf]

An Unpaired Cross-modality Segmentation Framework Using Data Augmentation and Hybrid Convolutional Networks for Segmenting Vestibular Schwannoma and Cochlea

Authors: Yuzhou Zhuang, Hong Liu, Enmin Song, Coskun Cetinkaya, Chih-Cheng Hung

Abstract: The crossMoDA challenge aims to automatically segment the vestibular schwannoma (VS) tumor and cochlea regions of unlabeled high-resolution T2 scans by leveraging labeled contrast-enhanced T1 scans. The 2022 edition extends the segmentation task by including multi-institutional scans. In this work, we proposed an unpaired cross-modality segmentation framework using data augmentation and hybrid con… ▽ More The crossMoDA challenge aims to automatically segment the vestibular schwannoma (VS) tumor and cochlea regions of unlabeled high-resolution T2 scans by leveraging labeled contrast-enhanced T1 scans. The 2022 edition extends the segmentation task by including multi-institutional scans. In this work, we proposed an unpaired cross-modality segmentation framework using data augmentation and hybrid convolutional networks. Considering heterogeneous distributions and various image sizes for multi-institutional scans, we apply the min-max normalization for scaling the intensities of all scans between -1 and 1, and use the voxel size resampling and center crop** to obtain fixed-size sub-volumes for training. We adopt two data augmentation methods for effectively learning the semantic information and generating realistic target domain scans: generative and online data augmentation. For generative data augmentation, we use CUT and CycleGAN to generate two groups of realistic T2 volumes with different details and appearances for supervised segmentation training. For online data augmentation, we design a random tumor signal reducing method for simulating the heterogeneity of VS tumor signals. Furthermore, we utilize an advanced hybrid convolutional network with multi-dimensional convolutions to adaptively learn sparse inter-slice information and dense intra-slice information for accurate volumetric segmentation of VS tumor and cochlea regions in anisotropic scans. On the crossMoDA2022 validation dataset, our method produces promising results and achieves the mean DSC values of 72.47% and 76.48% and ASSD values of 3.42 mm and 0.53 mm for VS tumor and cochlea regions, respectively. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: Accepted by BrainLes MICCAI proceedings

arXiv:2210.11779 [pdf, other]

doi 10.1109/LRA.2022.3152697

Reaching Through Latent Space: From Joint Statistics to Path Planning in Manipulation

Authors: Chia-Man Hung, Shaohong Zhong, Walter Goodwin, Oiwi Parker Jones, Martin Engelcke, Ioannis Havoutis, Ingmar Posner

Abstract: We present a novel approach to path planning for robotic manipulators, in which paths are produced via iterative optimisation in the latent space of a generative model of robot poses. Constraints are incorporated through the use of constraint satisfaction classifiers operating on the same space. Optimisation leverages gradients through our learned models that provide a simple way to combine goal r… ▽ More We present a novel approach to path planning for robotic manipulators, in which paths are produced via iterative optimisation in the latent space of a generative model of robot poses. Constraints are incorporated through the use of constraint satisfaction classifiers operating on the same space. Optimisation leverages gradients through our learned models that provide a simple way to combine goal reaching objectives with constraint satisfaction, even in the presence of otherwise non-differentiable constraints. Our models are trained in a task-agnostic manner on randomly sampled robot poses. In baseline comparisons against a number of widely used planners, we achieve commensurate performance in terms of task success, planning time and path length, performing successful path planning with obstacle avoidance on a real 7-DoF robot arm. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: 10 pages, 6 figures, 4 tables

ACM Class: I.2.6; I.2.9; I.2.10

Journal ref: IEEE Robotics and Automation Letters 7.2 (2022): 5334-5341

arXiv:2210.07362 [pdf, other]

Can Demographic Factors Improve Text Classification? Revisiting Demographic Adaptation in the Age of Transformers

Authors: Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Abstract: Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models. In this work, we investigate whether these previous findings still hold with state-of-the-art pretrained Transformer-based language models (PLMs). We use three common specialization methods… ▽ More Demographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating demographic factors can consistently improve performance for various NLP tasks with traditional NLP models. In this work, we investigate whether these previous findings still hold with state-of-the-art pretrained Transformer-based language models (PLMs). We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers (e.g., domain-specific or geographic knowledge). We adapt the language representations for the demographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling objectives with the prediction of demographic classes. Our results, when employing a multilingual PLM, show substantial gains in task performance across four languages (English, German, French, and Danish), which is consistent with the results of previous work. However, controlling for confounding factors - primarily domain and language proficiency of Transformer-based PLMs - shows that downstream performance gains from our demographic adaptation do not actually stem from demographic knowledge. Our results indicate that demographic specialization of PLMs, while holding promise for positive societal impact, still represents an unsolved problem for (modern) NLP. △ Less

Submitted 9 May, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Findings of EACL 2023. arXiv admin note: text overlap with arXiv:2208.01029

arXiv:2208.01029 [pdf, other]

On the Limitations of Sociodemographic Adaptation with Transformers

Authors: Chia-Chien Hung, Anne Lauscher, Dirk Hovy, Simone Paolo Ponzetto, Goran Glavaš

Abstract: Sociodemographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating specific sociodemographic factors can consistently improve performance for various NLP tasks in traditional NLP models. We investigate whether these previous findings still hold with state-of-the-art pretrained Transformers. We use three common specialization methods proven effective for inco… ▽ More Sociodemographic factors (e.g., gender or age) shape our language. Previous work showed that incorporating specific sociodemographic factors can consistently improve performance for various NLP tasks in traditional NLP models. We investigate whether these previous findings still hold with state-of-the-art pretrained Transformers. We use three common specialization methods proven effective for incorporating external knowledge into pretrained Transformers (e.g., domain-specific or geographic knowledge). We adapt the language representations for the sociodemographic dimensions of gender and age, using continuous language modeling and dynamic multi-task learning for adaptation, where we couple language modeling with the prediction of a sociodemographic class. Our results when employing a multilingual model show substantial performance gains across four languages (English, German, French, and Danish). These findings are in line with the results of previous work and hold promise for successful sociodemographic specialization. However, controlling for confounding factors like domain and language shows that, while sociodemographic adaptation does improve downstream performance, the gains do not always solely stem from sociodemographic knowledge. Our results indicate that sociodemographic specialization, while very important, is still an unresolved problem in NLP. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2206.03139 [pdf, other]

Intra-agent speech permits zero-shot task acquisition

Authors: Chen Yan, Federico Carnevale, Petko Georgiev, Adam Santoro, Aurelia Guy, Alistair Muldal, Chia-Chun Hung, Josh Abramson, Timothy Lillicrap, Gregory Wayne

Abstract: Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1… ▽ More Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1934) to better understand the role of intra-agent speech in embodied behavior. First, we formally pose intra-agent speech as a semi-supervised problem and develop two algorithms that enable visually grounded captioning with little labeled language data. We then experimentally compute scaling curves over different amounts of labeled data and compare the data efficiency against a supervised learning baseline. Finally, we incorporate intra-agent speech into an embodied, mobile manipulator agent operating in a 3D virtual world, and show that with as few as 150 additional image captions, intra-agent speech endows the agent with the ability to manipulate and answer questions about a new object without any related task-directed experience (zero-shot). Taken together, our experiments suggest that modelling intra-agent speech is effective in enabling embodied agents to learn new tasks efficiently and without direct interaction experience. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2205.14981 [pdf, other]

ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System

Authors: Chia-Chien Hung, Tommaso Green, Robert Litschko, Tornike Tsereteli, Sotaro Takeshita, Marco Bombieri, Goran Glavaš, Simone Paolo Ponzetto

Abstract: This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA). In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate from them an answer in the language of the question. We devised several approaches combining different model variants for three main compon… ▽ More This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA). In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate from them an answer in the language of the question. We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation. For passage retrieval, we evaluated the monolingual BM25 ranker against the ensemble of re-rankers based on multilingual pretrained language models (PLMs) and also variants of the shared task baseline, re-training it from scratch using a recently introduced contrastive loss that maintains a strong gradient signal throughout training by means of mixed negative samples. For answer generation, we focused on language- and domain-specialization by means of continued language model (LM) pretraining of existing multilingual encoders. Additionally, for both passage retrieval and answer generation, we augmented the training data provided by the task organizers with automatically generated question-answer pairs created from Wikipedia passages to mitigate the issue of data scarcity, particularly for the low-resource languages for which no training data were provided. Our results show that language- and domain-specialization as well as data augmentation help, especially for low-resource languages. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2205.10400 [pdf, other]

Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog

Authors: Chia-Chien Hung, Anne Lauscher, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš

Abstract: Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English language, primarily due to the shortage of robust TOD datasets in other languages, preventing the systematic investigation of cross-lingual transfer for this crucial NLP application area. In this work, we introduce Multi2WOZ, a new multilingual multi-domain TOD dataset, derived from the well-established… ▽ More Research on (multi-domain) task-oriented dialog (TOD) has predominantly focused on the English language, primarily due to the shortage of robust TOD datasets in other languages, preventing the systematic investigation of cross-lingual transfer for this crucial NLP application area. In this work, we introduce Multi2WOZ, a new multilingual multi-domain TOD dataset, derived from the well-established English dataset MultiWOZ, that spans four typologically diverse languages: Chinese, German, Arabic, and Russian. In contrast to concurrent efforts, Multi2WOZ contains gold-standard dialogs in target languages that are directly comparable with development and test portions of the English dataset, enabling reliable and comparative estimates of cross-lingual transfer performance for TOD. We then introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks. Using such conversational PrLMs specialized for concrete target languages, we systematically benchmark a number of zero-shot and few-shot cross-lingual transfer approaches on two standard TOD tasks: Dialog State Tracking and Response Retrieval. Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task. Most importantly, we show that our conversational specialization in the target language allows for an exceptionally sample-efficient few-shot transfer for downstream TOD tasks. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: NAACL 2022

arXiv:2205.07446 [pdf, other]

Miutsu: NTU's TaskBot for the Alexa Prize

Authors: Yen-Ting Lin, Hui-Chi Kuo, Ze-Song Xu, Ssu Chiu, Chieh-Chi Hung, Yi-Cheng Chen, Chao-Wei Huang, Yun-Nung Chen

Abstract: This paper introduces Miutsu, National Taiwan University's Alexa Prize TaskBot, which is designed to assist users in completing tasks requiring multiple steps and decisions in two different domains -- home improvement and cooking. We overview our system design and architectural goals, and detail the proposed core elements, including question answering, task retrieval, social chatting, and various… ▽ More This paper introduces Miutsu, National Taiwan University's Alexa Prize TaskBot, which is designed to assist users in completing tasks requiring multiple steps and decisions in two different domains -- home improvement and cooking. We overview our system design and architectural goals, and detail the proposed core elements, including question answering, task retrieval, social chatting, and various conversational modules. A dialogue flow is proposed to provide a robust and engaging conversation when handling complex tasks. We discuss the faced challenges during the competition and potential future work. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2201.00008 [pdf, other]

A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

Authors: Guanyao Li, Shuhan Zhong, S. -H. Gary Chan, Ruiyuan Li, Chih-Chieh Hung, Wen-Chih Peng

Abstract: We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computatio… ▽ More We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computationally intensive in training with a large number of hyper-parameters to tune. We propose ST-TIS, a novel, lightweight, and accurate Spatial-Temporal Transformer with information fusion and region sampling for traffic forecasting. ST-TIS extends the canonical Transformer with information fusion and region sampling. The information fusion module captures the complex spatial-temporal dependency between regions. The region sampling module is to improve the efficiency and prediction accuracy, cutting the computation complexity for dependency learning from $O(n^2)$ to $O(n\sqrt{n})$, where n is the number of regions. With far fewer parameters than state-of-the-art models, the offline training of our model is significantly faster in terms of tuning and computation (with a reduction of up to $90\%$ on training time and network parameters). Notwithstanding such training efficiency, extensive experiments show that ST-TIS is substantially more accurate in online prediction than state-of-the-art approaches (with an average improvement of up to $9.5\%$ on RMSE, and $12.4\%$ on MAPE). △ Less

Submitted 3 May, 2022; v1 submitted 30 December, 2021; originally announced January 2022.

arXiv:2110.08395 [pdf, other]

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog

Authors: Chia-Chien Hung, Anne Lauscher, Simone Paolo Ponzetto, Goran Glavaš

Abstract: Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TO… ▽ More Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for TOD. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DomainCC and DomainReddit -- resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters -- additional parameter-light layers in which we encode the domain knowledge. Our experiments with prominent TOD tasks -- dialog state tracking (DST) and response retrieval (RR) -- encompassing five domains from the MultiWOZ benchmark demonstrate the effectiveness of DS-TOD. Moreover, we show that the light-weight adapter-based specialization (1) performs comparably to full fine-tuning in single domain setups and (2) is particularly suitable for multi-domain specialization, where besides advantageous computational footprint, it can offer better TOD performance. △ Less

Submitted 20 May, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Findings of ACL 2022

arXiv:2107.05455 [pdf, ps, other]

A Local Diagnosis Algorithm for Hypercube-like Networks under the BGM Diagnosis Model

Authors: Cheng-Kuan Lin, Tzu-Liang Kung, Chun-Nan Hung, Yuan-Hsiang Teng

Abstract: System diagnosis is process of identifying faulty nodes in a system. An efficient diagnosis is crucial for a multiprocessor system. The BGM diagnosis model is a modification of the PMC diagnosis model, which is a test-based diagnosis. In this paper, we present a specific structure and propose an algorithm for diagnosing a node in a system under the BGM model. We also give a polynomial-time algorit… ▽ More System diagnosis is process of identifying faulty nodes in a system. An efficient diagnosis is crucial for a multiprocessor system. The BGM diagnosis model is a modification of the PMC diagnosis model, which is a test-based diagnosis. In this paper, we present a specific structure and propose an algorithm for diagnosing a node in a system under the BGM model. We also give a polynomial-time algorithm that a node in a hypercube-like network can be diagnosed correctly in three test rounds under the BGM diagnosis model. △ Less

Submitted 8 June, 2022; v1 submitted 30 June, 2021; originally announced July 2021.

Journal ref: Fundamenta Informaticae, Volume 185, Issue 4 (July 7, 2022) fi:7674

arXiv:2104.06274 [pdf, other]

Optimal Data Placement for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments

Authors: Xin Du, Songtao Tang, Zhihui Lu, Keke Gai, Jie Wu, Patrick C. K. Hung

Abstract: The heterogeneous edge-cloud computing paradigm can provide a more optimal direction to deploy scientific workflows than traditional distributed computing or cloud computing environments. Due to the different sizes of scientific datasets and some of these datasets must keep private, it is still a difficult problem to finding an data placement strategy that can minimize data transmission as well as… ▽ More The heterogeneous edge-cloud computing paradigm can provide a more optimal direction to deploy scientific workflows than traditional distributed computing or cloud computing environments. Due to the different sizes of scientific datasets and some of these datasets must keep private, it is still a difficult problem to finding an data placement strategy that can minimize data transmission as well as placement cost. To address this issue, this paper combines advantages of both edge and cloud computing to construct a data placement model, which can balance data transfer time and data placement cost using intelligent computation. The most difficult research challenge the model solved is to consider many constrain in this hybrid computing environments, which including shared datasets within individual and among multiple workflows across various geographical regions. According to the constructed model, the study propose a new data placement strategy named DE-DPSO-DPS, which using a discrete particle swarm optimization algorithm with differential evolution (DE-DPSO-DPA) to distribute these scientific datasets. The strategy also not only consider the characteristics such as the number and storage capacity of edge micro-datacenters, the bandwidth between different datacenters and the proportion of private datasets, but also analysis the performance of algorithm during the workflows execution. Comprehensive experiments are designed in simulated heterogeneous edge-cloud computing environments demonstrate that the data placement strategy can effectively reduce the data transmission time and placement cost as compared to traditional strategies for data-sharing scientific workflows. △ Less

Submitted 13 April, 2021; originally announced April 2021.

arXiv:2103.11881 [pdf, other]

Introspective Visuomotor Control: Exploiting Uncertainty in Deep Visuomotor Control for Failure Recovery

Authors: Chia-Man Hung, Li Sun, Yizhe Wu, Ioannis Havoutis, Ingmar Posner

Abstract: End-to-end visuomotor control is emerging as a compelling solution for robot manipulation tasks. However, imitation learning-based visuomotor control approaches tend to suffer from a common limitation, lacking the ability to recover from an out-of-distribution state caused by compounding errors. In this paper, instead of using tactile feedback or explicitly detecting the failure through vision, we… ▽ More End-to-end visuomotor control is emerging as a compelling solution for robot manipulation tasks. However, imitation learning-based visuomotor control approaches tend to suffer from a common limitation, lacking the ability to recover from an out-of-distribution state caused by compounding errors. In this paper, instead of using tactile feedback or explicitly detecting the failure through vision, we investigate using the uncertainty of a policy neural network. We propose a novel uncertainty-based approach to detect and recover from failure cases. Our hypothesis is that policy uncertainties can implicitly indicate the potential failures in the visuomotor control task and that robot states with minimum uncertainty are more likely to lead to task success. To recover from high uncertainty cases, the robot monitors its uncertainty along a trajectory and explores possible actions in the state-action space to bring itself to a more certain state. Our experiments verify this hypothesis and show a significant improvement on task success rate: 12% in pushing, 15% in pick-and-reach and 22% in pick-and-place. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 7 pages, 5 figures, 1 table

ACM Class: I.2.9; I.2.10

arXiv:2102.02446 [pdf, other]

The Analysis from Nonlinear Distance Metric to Kernel-based Drug Prescription Prediction System

Authors: Der-Chen Chang, Ophir Frieder, Chi-Feng Hung, Hao-Ren Yao

Abstract: Distance metrics and their nonlinear variant play a crucial role in machine learning based real-world problem solving. We demonstrated how Euclidean and cosine distance measures differ not only theoretically but also in real-world medical application, namely, outcome prediction of drug prescription. Euclidean distance exhibits favorable properties in the local geometry problem. To this regard, Euc… ▽ More Distance metrics and their nonlinear variant play a crucial role in machine learning based real-world problem solving. We demonstrated how Euclidean and cosine distance measures differ not only theoretically but also in real-world medical application, namely, outcome prediction of drug prescription. Euclidean distance exhibits favorable properties in the local geometry problem. To this regard, Euclidean distance can be applied under short-term disease with low-variation outcome observation. Moreover, when presenting to highly variant chronic disease, it is preferable to use cosine distance. These different geometric properties lead to different submanifolds in the original embedded space, and hence, to different optimizing nonlinear kernel embedding frameworks. We first established the geometric properties that we needed in these frameworks. From these properties interpreted their differences in certain perspectives. Our evaluation on real-world, large-scale electronic health records and embedding space visualization empirically validated our approach. △ Less

Submitted 23 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

Comments: Accepted to Journal of Nonlinear and Variational Analysis, JNVA 2021

arXiv:2011.05755 [pdf, other]

Cryo-RALib -- a modular library for accelerating alignment in cryo-EM

Authors: Szu-Chi Chung, Cheng-Yu Hung, Huei-Lun Siao, Hung-Yi Wu, Wei-Hau Chang, I-** Tu

Abstract: Thanks to automated cryo-EM and GPU-accelerated processing, single-particle cryo-EM has become a rapid structure determination method that permits capture of dynamical structures of molecules in solution, which has been recently demonstrated by the determination of COVID-19 spike protein in March, shortly after its breakout in late January 2020. This rapidity is critical for vaccine development in… ▽ More Thanks to automated cryo-EM and GPU-accelerated processing, single-particle cryo-EM has become a rapid structure determination method that permits capture of dynamical structures of molecules in solution, which has been recently demonstrated by the determination of COVID-19 spike protein in March, shortly after its breakout in late January 2020. This rapidity is critical for vaccine development in response to emerging pandemic. This explains why a 2D classification approach based on multi-reference alignment (MRA) is not as popular as the Bayesian-based approach despite that the former has advantage in differentiating structural variations under low signal-to-noise ratio. This is perhaps because that MRA is a time-consuming process and a modular GPU-acceleration library for MRA is lacking. Here, we introduce a library called Cryo-RALib that expands the functionality of CUDA library used by GPU ISAC. It contains a GPU-accelerated MRA routine for accelerating MRA-based classification algorithms. In addition, we connect the cryo-EM image analysis with the python data science stack so as to make it easier for users to perform data analysis and visualization. Benchmarking on the TaiWan Computing Cloud (TWCC) container shows that our implementation can accelerate the computation by one order of magnitude. The library is available at https://github.com/phonchi/Cryo-RAlib. △ Less

Submitted 25 February, 2021; v1 submitted 11 November, 2020; originally announced November 2020.

arXiv:2008.01868 [pdf, other]

doi 10.1145/3388440.3412459

Cross-Global Attention Graph Kernel Network Prediction of Drug Prescription

Authors: Hao-Ren Yao, Der-Chen Chang, Ophir Frieder, Wendy Huang, I-Chia Liang, Chi-Feng Hung

Abstract: We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with… ▽ More We present an end-to-end, interpretable, deep-learning architecture to learn a graph kernel that predicts the outcome of chronic disease drug prescription. This is achieved through a deep metric learning collaborative with a Support Vector Machine objective using a graphical representation of Electronic Health Records. We formulate the predictive model as a binary graph classification problem with an adaptive learned graph kernel through novel cross-global attention node matching between patient graphs, simultaneously computing on multiple graphs without training pair or triplet generation. Results using the Taiwanese National Health Insurance Research Database demonstrate that our approach outperforms current start-of-the-art models both in terms of accuracy and interpretability. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: ACM-BCB 2020 (Full paper)

Journal ref: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (BCB '20), September 21-24, 2020, Virtual Event, USA

arXiv:2005.02220 [pdf]

Learning of Art Style Using AI and Its Evaluation Based on Psychological Experiments

Authors: Mai Cong Hung, Ryohei Nakatsu, Naoko Tosa, Takashi Kusumi, Koji Koyamada

Abstract: GANs (Generative adversarial networks) is a new AI technology that can perform deep learning with less training data and has the capability of achieving transformation between two image sets. Using GAN we have carried out a comparison between several art sets with different art style. We have prepared several image sets; a flower photo set (A), an art image set (B1) of Impressionism drawings, an a… ▽ More GANs (Generative adversarial networks) is a new AI technology that can perform deep learning with less training data and has the capability of achieving transformation between two image sets. Using GAN we have carried out a comparison between several art sets with different art style. We have prepared several image sets; a flower photo set (A), an art image set (B1) of Impressionism drawings, an art image set of abstract paintings (B2), an art image set of Chinese figurative paintings, (B3), and an art image set of abstract images (B4) created by Naoko Tosa, one of the authors. Transformation between set A to each of B was carried out using GAN and four image sets (B1, B2, B3, B4) was obtained. Using these four image sets we have carried out psychological experiment by asking subjects consisting of 23 students to fill in questionnaires. By analyzing the obtained questionnaires, we have found the followings. Abstract drawings and figurative drawings are clearly judged to be different. Figurative drawings in West and East were judged to be similar. Abstract images by Naoko Tosa were judged as similar to Western abstract images. These results show that AI could be used as an analysis tool to reveal differences between art genres. △ Less

Submitted 4 May, 2020; originally announced May 2020.

arXiv:2005.01492 [pdf, other]

TRIPDECODER: Study Travel Time Attributes and Route Preferences of Metro Systems from Smart Card Data

Authors: Xiancai Tian, Baihua Zheng, Yazhe Wang, Hsiao-Ting Huang, Chih-Chieh Hung

Abstract: In this paper, we target at recovering the exact routes taken by commuters inside a metro system that arenot captured by an Automated Fare Collection (AFC) system and hence remain unknown. We strategicallypropose two inference tasks to handle the recovering, one to infer the travel time of each travel link thatcontributes to the total duration of any trip inside a metro network and the other to in… ▽ More In this paper, we target at recovering the exact routes taken by commuters inside a metro system that arenot captured by an Automated Fare Collection (AFC) system and hence remain unknown. We strategicallypropose two inference tasks to handle the recovering, one to infer the travel time of each travel link thatcontributes to the total duration of any trip inside a metro network and the other to infer the route preferencesbased on historical trip records and the travel time of each travel link inferred in the previous inferencetask. As these two inference tasks have interrelationship, most of existing works perform these two taskssimultaneously. However, our solutionTripDecoderadopts a totally different approach. To the best of ourknowledge,TripDecoderis the first model that points out and fully utilizes the fact that there are some tripsinside a metro system with only one practical route available. It strategically decouples these two inferencetasks by only taking those trip records with only one practical route as the input for the first inference taskof travel time and feeding the inferred travel time to the second inference task as an additional input whichnot only improves the accuracy but also effectively reduces the complexity of both inference tasks. Twocase studies have been performed based on the city-scale real trip records captured by the AFC systems inSingapore and Taipei to compare the accuracy and efficiency ofTripDecoderand its competitors. As expected,TripDecoderhas achieved the best accuracy in both datasets, and it also demonstrates its superior efficiencyand scalability. △ Less

Submitted 1 May, 2020; originally announced May 2020.

ACM Class: I.5.1

arXiv:2003.12175 [pdf, other]

Incremental Learning Algorithm for Sound Event Detection

Authors: Eunjeong Koh, Fatemeh Saki, Yinyi Guo, Cheng-Yu Hung, Erik Visser

Abstract: This paper presents a new learning strategy for the Sound Event Detection (SED) system to tackle the issues of i) knowledge migration from a pre-trained model to a new target model and ii) learning new sound events without forgetting the previously learned ones without re-training from scratch. In order to migrate the previously learned knowledge from the source model to the target one, a neural a… ▽ More This paper presents a new learning strategy for the Sound Event Detection (SED) system to tackle the issues of i) knowledge migration from a pre-trained model to a new target model and ii) learning new sound events without forgetting the previously learned ones without re-training from scratch. In order to migrate the previously learned knowledge from the source model to the target one, a neural adapter is employed on the top of the source model. The source model and the target model are merged via this neural adapter layer. The neural adapter layer facilitates the target model to learn new sound events with minimal training data and maintaining the performance of the previously learned sound events similar to the source model. Our extensive analysis on the DCASE16 and US-SED dataset reveals the effectiveness of the proposed method in transferring knowledge between source and target models without introducing any performance degradation on the previously learned sound events while obtaining a competitive detection performance on the newly learned sound events. △ Less

Submitted 26 March, 2020; originally announced March 2020.

Comments: IEEE ICME 2020 Camera Ready Version

Journal ref: IEEE ICME 2020

arXiv:2003.08854 [pdf, other]

Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Authors: Oliver Groth, Chia-Man Hung, Andrea Vedaldi, Ingmar Posner

Abstract: Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images. Conditioning VMC on desired goal states is a promising way of achieving versatile skill primitives. However, common conditioning schemes either rely on task-specific fine tuning - e.g. using one-shot imitation learning (IL) - or on sampling approaches using a forw… ▽ More Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images. Conditioning VMC on desired goal states is a promising way of achieving versatile skill primitives. However, common conditioning schemes either rely on task-specific fine tuning - e.g. using one-shot imitation learning (IL) - or on sampling approaches using a forward model of scene dynamics i.e. model-predictive control (MPC), leaving deployability and planning horizon severely limited. In this paper we propose a conditioning scheme which avoids these pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion and the distance to a given target observation. In contrast to related works, this enables our approach to efficiently perform complex manipulation tasks from raw image observations without predefined control primitives or test time demonstrations. We report significant improvements in task success over representative MPC and IL baselines. We also demonstrate our model's generalisation capabilities in challenging, unseen tasks featuring visual noise, cluttered scenes and unseen object geometries. △ Less

Submitted 24 September, 2021; v1 submitted 19 March, 2020; originally announced March 2020.

Comments: revised manuscript with additional baselines and generalisation experiments; 11 pages, 8 figures, 7 tables

ACM Class: I.2.9; I.2.10

arXiv:1910.14540 [pdf, other]

Team NCTU: Toward AI-Driving for Autonomous Surface Vehicles -- From Duckietown to RobotX

Authors: Yi-Wei Huang, Tzu-Kuan Chuang, Ni-Ching Lin, Yu-Chieh Hsiao, Pin-Wei Chen, Ching-Tang Hung, Shih-Hsing Liu, Hsiao-Sheng Chen, Ya-Hsiu Hsieh, Ching-Tang Hung, Yen-Hsiang Huang, Yu-Xuan Chen, Kuan-Lin Chen, Ya-Jou Lan, Chao-Chun Hsu, Chun-Yi Lin, Jhih-Ying Li, Jui-Te Huang, Yu-Jen Menn, Sin-Kiat Lim, Kim-Boon Lua, Chia-Hung Dylan Tsai, Chi-Fang Chen, Hsueh-Cheng Wang

Abstract: Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts… ▽ More Robotic software and hardware systems of autonomous surface vehicles have been developed in transportation, military, and ocean researches for decades. Previous efforts in RobotX Challenges 2014 and 2016 facilitates the developments for important tasks such as obstacle avoidance and docking. Team NCTU is motivated by the AI Driving Olympics (AI-DO) developed by the Duckietown community, and adopts the principles to RobotX challenge. With the containerization (Docker) and uniformed AI agent (with observations and actions), we could better 1) integrate solutions developed in different middlewares (ROS and MOOS), 2) develop essential functionalities of from simulation (Gazebo) to real robots (either miniaturized or full-sized WAM-V), and 3) compare different approaches either from classic model-based or learning-based. Finally, we setup an outdoor on-surface platform with localization services for evaluation. Some of the preliminary results will be presented for the Team NCTU participations of the RobotX competition in Hawaii in 2018. △ Less

Submitted 31 October, 2019; originally announced October 2019.

arXiv:1910.06562 [pdf, other]

Compacting, Picking and Growing for Unforgetting Continual Learning

Authors: Steven C. Y. Hung, Cheng-Hao Tu, Cheng-En Wu, Chien-Hung Chen, Yi-Ming Chan, Chu-Song Chen

Abstract: Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the… ▽ More Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the number of sequential tasks in a continual learning process. Our approach is easy to implement and owns several favorable characteristics. First, it can avoid forgetting (i.e., learn new tasks while remembering all previous tasks). Second, it allows model expansion but can maintain the model compactness when handling sequential tasks. Besides, through our compaction and selection/expansion mechanism, we show that the knowledge accumulated through learning previous tasks is helpful to build a better model for the new tasks compared to training the models independently with tasks. Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training. △ Less

Submitted 30 October, 2019; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: To appear in NeurIPS 2019

arXiv:1905.08413 [pdf]

Dual-branch residual network for lung nodule segmentation

Authors: Haichao Cao, Hong Liu, Enmin Song, Chih-Cheng Hung, Guangzhi Ma, Xiangyang Xu, Renchao **, Jianguo Lu

Abstract: An accurate segmentation of lung nodules in computed tomography (CT) images is critical to lung cancer analysis and diagnosis. However, due to the variety of lung nodules and the similarity of visual characteristics between nodules and their surroundings, a robust segmentation of nodules becomes a challenging problem. In this study, we propose the Dual-branch Residual Network (DB-ResNet) which is… ▽ More An accurate segmentation of lung nodules in computed tomography (CT) images is critical to lung cancer analysis and diagnosis. However, due to the variety of lung nodules and the similarity of visual characteristics between nodules and their surroundings, a robust segmentation of nodules becomes a challenging problem. In this study, we propose the Dual-branch Residual Network (DB-ResNet) which is a data-driven model. Our approach integrates two new schemes to improve the generalization capability of the model: 1) the proposed model can simultaneously capture multi-view and multi-scale features of different nodules in CT images; 2) we combine the features of the intensity and the convolution neural networks (CNN). We propose a pooling method, called the central intensity-pooling layer (CIP), to extract the intensity features of the center voxel of the block, and then use the CNN to obtain the convolutional features of the center voxel of the block. In addition, we designed a weighted sampling strategy based on the boundary of nodules for the selection of those voxels using the weighting score, to increase the accuracy of the model. The proposed method has been extensively evaluated on the LIDC dataset containing 986 nodules. Experimental results show that the DB-ResNet achieves superior segmentation performance with an average dice score of 82.74% on the dataset. Moreover, we compared our results with those of four radiologists on the same dataset. The comparison showed that our average dice score was 0.49% higher than that of human experts. This proves that our proposed method is as good as the experienced radiologist. △ Less

Submitted 20 May, 2019; originally announced May 2019.

Comments: 24 pages, 6 figures

arXiv:1905.03445 [pdf]

Two-Stage Convolutional Neural Network Architecture for Lung Nodule Detection

Authors: Haichao Cao, Hong Liu, Enmin Song, Guangzhi Ma, Xiangyang Xu, Renchao **, Tengying Liu, Chih-Cheng Hung

Abstract: Early detection of lung cancer is an effective way to improve the survival rate of patients. It is a critical step to have accurate detection of lung nodules in computed tomography (CT) images for the diagnosis of lung cancer. However, due to the heterogeneity of the lung nodules and the complexity of the surrounding environment, robust nodule detection has been a challenging task. In this study,… ▽ More Early detection of lung cancer is an effective way to improve the survival rate of patients. It is a critical step to have accurate detection of lung nodules in computed tomography (CT) images for the diagnosis of lung cancer. However, due to the heterogeneity of the lung nodules and the complexity of the surrounding environment, robust nodule detection has been a challenging task. In this study, we propose a two-stage convolutional neural network (TSCNN) architecture for lung nodule detection. The CNN architecture in the first stage is based on the improved UNet segmentation network to establish an initial detection of lung nodules. Simultaneously, in order to obtain a high recall rate without introducing excessive false positive nodules, we propose a novel sampling strategy, and use the offline hard mining idea for training and prediction according to the proposed cascaded prediction method. The CNN architecture in the second stage is based on the proposed dual pooling structure, which is built into three 3D CNN classification networks for false positive reduction. Since the network training requires a significant amount of training data, we adopt a data augmentation method based on random mask. Furthermore, we have improved the generalization ability of the false positive reduction model by means of ensemble learning. The proposed method has been experimentally verified on the LUNA dataset. Experimental results show that the proposed TSCNN architecture can obtain competitive detection performance. △ Less

Submitted 9 May, 2019; originally announced May 2019.

Comments: 29 pages, 10 figures

arXiv:1903.07164 [pdf, ps, other]

Linearly Constrained Smoothing Group Sparsity Solvers in Off-grid Model

Authors: Cheng-Yu Hung, Mostafa Kaveh

Abstract: In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the inter… ▽ More In compressed sensing, the sensing matrix is assumed perfectly known. However, there exists perturbation in the sensing matrix in reality due to sensor offsets or noise disturbance. Directions-of-arrival (DoA) estimation with off-grid effect satisfies this situation, and can be formulated into a (non)convex optimization problem with linear inequalities constraints, which can be solved by the interior point method (using the CVX tools), but at a large computational cost. In this work, in order to design efficient algorithms, we consider various alternative formulations, such as unconstrained formulation, primal-dual formulation, or conic formulation to develop group-sparsity promoted solvers. First, the consensus alternating direction method of multipliers (C-ADMM) is applied. Then, iterative algorithms for the BPDN formulation is proposed by combining the Nesterov smoothing technique with accelerated proximal gradient method, and the convergence analysis of the method is conducted as well. We also developed a variant of EGT (Excessive Gap Technique)-based primal-dual method to systematically reduce the smoothing parameter sequentially. Finally, we propose algorithms for quadratically constrained L2-L1 mixed norm minimization problem by using the smoothed dual conic optimization (SDCO) and continuation technique. The performance of accuracy and convergence for all the proposed methods are demonstrated in the numerical simulations. △ Less

Submitted 3 June, 2019; v1 submitted 17 March, 2019; originally announced March 2019.

arXiv:1903.07158 [pdf, ps, other]

Joint Block Low Rank and Sparse Matrix Recovery in Array Self-Calibration Off-Grid DoA Estimation

Authors: Cheng-Yu Hung, Mostafa Kaveh

Abstract: This letter addresses the estimation of directions-of-arrival (DoA) by a sensor array using a sparse model in the presence of array calibration errors and off-grid directions. The received signal utilizes previously used models for unknown errors in calibration and structured linear representation of the off-grid effect. A convex optimization problem is formulated with an objective function to pro… ▽ More This letter addresses the estimation of directions-of-arrival (DoA) by a sensor array using a sparse model in the presence of array calibration errors and off-grid directions. The received signal utilizes previously used models for unknown errors in calibration and structured linear representation of the off-grid effect. A convex optimization problem is formulated with an objective function to promote two-layer joint block-sparsity with its second-order cone programming (SOCP) representation. The performance of the proposed method is demonstrated by numerical simulations and compared with the Cramer-Rao Bound (CRB), and several previously proposed methods. △ Less

Submitted 3 June, 2019; v1 submitted 17 March, 2019; originally announced March 2019.

arXiv:1902.04043 [pdf, other]

The StarCraft Multi-Agent Challenge

Authors: Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

Abstract: In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such p… ▽ More In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardised environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. SMAC is based on the popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an independent agent that must act based on local observations. We offer a diverse set of challenge maps and recommendations for best practices in benchmarking and evaluations. We also open-source a deep multi-agent RL learning framework including state-of-the-art algorithms. We believe that SMAC can provide a standard benchmark environment for years to come. Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0. △ Less

Submitted 9 December, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

arXiv:1810.06721 [pdf, other]

Optimizing Agent Behavior over Long Time Scales by Transporting Value

Authors: Chia-Chun Hung, Timothy Lillicrap, Josh Abramson, Yan Wu, Mehdi Mirza, Federico Carnevale, Arun Ahuja, Greg Wayne

Abstract: Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequ… ▽ More Humans spend a remarkable fraction of waking life engaged in acts of "mental time travel". We dwell on our actions in the past and experience satisfaction or regret. More than merely autobiographical storytelling, we use these event recollections to change how we will act in similar scenarios in the future. This process endows us with a computationally important ability to link actions and consequences across long spans of time, which figures prominently in addressing the problem of long-term temporal credit assignment; in artificial intelligence (AI) this is the question of how to evaluate the utility of the actions within a long-duration behavioral sequence leading to success or failure in a task. Existing approaches to shorter-term credit assignment in AI cannot solve tasks with long delays between actions and consequences. Here, we introduce a new paradigm for reinforcement learning where agents use recall of specific memories to credit actions from the past, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire computational models in neuroscience, psychology, and behavioral economics. △ Less

Submitted 21 December, 2018; v1 submitted 15 October, 2018; originally announced October 2018.

arXiv:1804.01128 [pdf, other]

Probing Physics Knowledge Using Tools from Developmental Psychology

Authors: Luis Piloto, Ari Weinstein, Dhruva TB, Arun Ahuja, Mehdi Mirza, Greg Wayne, David Amos, Chia-chun Hung, Matt Botvinick

Abstract: In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract ge… ▽ More In order to build agents with a rich understanding of their environment, one key objective is to endow them with a grasp of intuitive physics; an ability to reason about three-dimensional objects, their dynamic interactions, and responses to forces. While some work on this problem has taken the approach of building in components such as ready-made physics engines, other research aims to extract general physical concepts directly from sensory data. In the latter case, one challenge that arises is evaluating the learning system. Research on intuitive physics knowledge in children has long employed a violation of expectations (VOE) method to assess children's mastery of specific physical concepts. We take the novel step of applying this method to artificial learning systems. In addition to introducing the VOE technique, we describe a set of probe datasets inspired by classic test stimuli from developmental psychology. We test a baseline deep learning system on this battery, as well as on a physics learning dataset ("IntPhys") recently posed by another research group. Our results show how the VOE technique may provide a useful tool for tracking physics knowledge in future research. △ Less

Submitted 3 April, 2018; originally announced April 2018.

arXiv:1803.10760 [pdf, other]

Unsupervised Predictive Memory in a Goal-Directed Agent

Authors: Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Abstract: Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement l… ▽ More Animals execute goal-directed behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences. △ Less

Submitted 28 March, 2018; originally announced March 2018.

Showing 1–50 of 54 results for author: Hung, C