Search | arXiv e-print repository

EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models

Authors: Chun-Chieh Liao, Wei-Ting Kuo, I-Hsuan Hu, Yen-Chen Shih, Jun-En Ding, Feng Liu, Fang-Ming Hung

Abstract: Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and develo** application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore,… ▽ More Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and develo** application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore, we developed an EHR-based chronic disease prediction platform utilizing Large Language Multimodal Models (LLMMs), successfully integrating with frontend web and mobile applications for prediction. This prediction platform can also connect to the hospital's backend database, providing physicians with real-time risk assessment diagnostics. The demonstration link can be found at https://www.youtube.com/watch?v=oqmL9DEDFgA. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.05982 [pdf]

Artificial Intelligence for Neuro MRI Acquisition: A Review

Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potential in enhancing the efficiency and throughput of acquisition steps. This review discusses several pivotal AI-based methods in neuro MRI acquisition, focusing on their technological advances, impact on clinical practice, and potential risks. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Submitted to MAGMA for review

arXiv:2406.04151 [pdf, other]

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuan**g Huang, Zuxuan Wu, Yu-Gang Jiang

Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on https://github.com/WooooDyy/AgentGym. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project site: https://agentgym.github.io

arXiv:2406.01436 [pdf, other]

Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

Authors: Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen

Abstract: Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing… ▽ More Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing a unified view of the challenges associated with knowledge editing in LLMs. We discuss related works and summarize potential research directions to overcome these limitations. Our work highlights the limitations of current knowledge editing methods, emphasizing the need for deeper understanding of inner knowledge structures of LLMs and improved knowledge editing methods. To foster future research, we have released the complementary materials such as paper collection publicly at https://github.com/MiuLab/EditLLM-Survey △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00247 [pdf, other]

Large Language Models for Relevance Judgment in Product Search

Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao

Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for… ▽ More High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Using a unique dataset of multi-million QIPs, annotated by human evaluators, we test and optimize hyper parameters for finetuning billion-parameter LLMs with and without Low Rank Adaption (LoRA), as well as various modes of item attribute concatenation and prompting in LLM finetuning, and consider trade offs in item attribute inclusion for quality of relevance predictions. We demonstrate considerable improvement over baselines of prior generations of LLMs, as well as off-the-shelf models, towards relevance annotations on par with the human relevance evaluators. Our findings have immediate implications for the growing field of relevance judgment automation in product search. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 10 pages, 1 figure, 11 tables - SIGIR 2024, LLM4Eval

ACM Class: H.3.3; I.2.7

arXiv:2403.06504 [pdf, other]

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Authors: Changyue Liao, Mo Sun, Zihan Yang, Kaiqi Chen, Binhang Yuan, Fei Wu, Zeke Wang

Abstract: Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient des… ▽ More Recent advances in large language models have brought immense value to the world, with their superior capabilities stemming from the massive number of parameters they utilize. However, even the GPUs with the highest memory capacities, currently peaking at 80GB, are far from sufficient to accommodate these vast parameters and their associated optimizer states when conducting stochastic gradient descent-based optimization. One approach to hosting such huge models is to aggregate device memory from many GPUs. However, this approach introduces prohibitive costs for most academic researchers, who always have a limited budget for many high-end GPU servers. In this paper, we focus on huge model fine-tuning on a single, even low-end, GPU in a commodity server, which is accessible to most AI researchers. In such a scenario, the state-of-the-art work ZeRO-Infinity suffers from two severe issues when running in a commodity server: 1) low GPU utilization due to inefficient swap**, and 2) limited trainable model size due to CPU memory capacity. The underlying reason is that ZeRO-Infinity is optimized for running on high-end GPU servers. To this end, we present Fuyou, a low-cost training framework that enables efficient 100B huge model fine-tuning on a low-end server with a low-end GPU and limited CPU memory capacity. The key idea is to add the SSD-CPU communication as an optimization dimension and thus carefully co-optimize computation and data swap** from a systematic approach to maximize GPU utilization. The experimental results show that 1) Fuyou is able to fine-tune 175B GPT-3 on a consumer GPU RTX 4090 with high GPU utilization, while ZeRO-Infinity fails to fine-tune; and 2) when training a small GPT-3 13B model, Fuyou achieves 156 TFLOPS on an RTX 4090 GPU while ZeRO-Infinity only achieves 45 TFLOPS. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.04416 [pdf, other]

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Authors: Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis

Abstract: Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label sp… ▽ More Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg △ Less

Submitted 29 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.02662 [pdf, other]

Image-Caption Encoding for Improving Zero-Shot Generalization

Authors: Eric Yang Yu, Christopher Liao, Sathvik Ravi, Theodoros Tsiligkaridis, Brian Kulis

Abstract: Recent advances in vision-language models have combined contrastive approaches with generative methods to achieve state-of-the-art (SOTA) on downstream inference tasks like zero-shot image classification. However, a persistent issue of these models for image classification is their out-of-distribution (OOD) generalization capabilities. We first show that when an OOD data point is misclassified, th… ▽ More Recent advances in vision-language models have combined contrastive approaches with generative methods to achieve state-of-the-art (SOTA) on downstream inference tasks like zero-shot image classification. However, a persistent issue of these models for image classification is their out-of-distribution (OOD) generalization capabilities. We first show that when an OOD data point is misclassified, the correct class can be typically found in the Top-K predicted classes. In order to steer the model prediction toward the correct class within the top predicted classes, we propose the Image-Caption Encoding (ICE) method, a straightforward approach that directly enforces consistency between the image-conditioned and caption-conditioned predictions at evaluation time only. Intuitively, we take advantage of unique properties of the generated captions to guide our local search for the correct class label within the Top-K predicted classes. We show that our method can be easily combined with other SOTA methods to enhance Top-1 OOD accuracies by 0.5% on average and up to 3% on challenging datasets. Our code: https://github.com/Chris210634/ice △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01439 [pdf, other]

From Words to Molecules: A Survey of Large Language Models in Chemistry

Authors: Chang Liao, Yemin Yu, Yu Mei, Ying Wei

Abstract: In recent years, Large Language Models (LLMs) have achieved significant success in natural language processing (NLP) and various interdisciplinary areas. However, applying LLMs to chemistry is a complex task that requires specialized domain knowledge. This paper provides a thorough exploration of the nuanced methodologies employed in integrating LLMs into the field of chemistry, delving into the c… ▽ More In recent years, Large Language Models (LLMs) have achieved significant success in natural language processing (NLP) and various interdisciplinary areas. However, applying LLMs to chemistry is a complex task that requires specialized domain knowledge. This paper provides a thorough exploration of the nuanced methodologies employed in integrating LLMs into the field of chemistry, delving into the complexities and innovations at this interdisciplinary juncture. Specifically, our analysis begins with examining how molecular information is fed into LLMs through various representation and tokenization methods. We then categorize chemical LLMs into three distinct groups based on the domain and modality of their input data, and discuss approaches for integrating these inputs for LLMs. Furthermore, this paper delves into the pretraining objectives with adaptations to chemical LLMs. After that, we explore the diverse applications of LLMs in chemistry, including novel paradigms for their application in chemistry tasks. Finally, we identify promising research directions, including further integration with chemical knowledge, advancements in continual learning, and improvements in model interpretability, paving the way for groundbreaking developments in the field. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Submitted to IJCAI 2024 survey track

arXiv:2401.02143 [pdf, other]

Graph Neural Networks for Tabular Data Learning: A Survey with Taxonomy and Directions

Authors: Cheng-Te Li, Yu-Che Tsai, Chih-Yao Chen, Jay Chiehen Liao

Abstract: In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and fe… ▽ More In this survey, we dive into Tabular Data Learning (TDL) using Graph Neural Networks (GNNs), a domain where deep learning-based approaches have increasingly shown superior performance in both classification and regression tasks compared to traditional methods. The survey highlights a critical gap in deep neural TDL methods: the underrepresentation of latent correlations among data instances and feature values. GNNs, with their innate capability to model intricate relationships and interactions between diverse elements of tabular data, have garnered significant interest and application across various TDL domains. Our survey provides a systematic review of the methods involved in designing and implementing GNNs for TDL (GNN4TDL). It encompasses a detailed investigation into the foundational aspects and an overview of GNN-based TDL methods, offering insights into their evolving landscape. We present a comprehensive taxonomy focused on constructing graph structures and representation learning within GNN-based TDL methods. In addition, the survey examines various training plans, emphasizing the integration of auxiliary tasks to enhance the effectiveness of instance representations. A critical part of our discussion is dedicated to the practical application of GNNs across a spectrum of GNN4TDL scenarios, demonstrating their versatility and impact. Lastly, we discuss the limitations and propose future research directions, aiming to spur advancements in GNN4TDL. This survey serves as a resource for researchers and practitioners, offering a thorough understanding of GNNs' role in revolutionizing TDL and pointing towards future innovations in this promising area. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: Under review, ongoing work, Github page: https://github.com/Roytsai27/awesome-GNN4TDL

arXiv:2311.15480 [pdf]

Automatic Time Signature Determination for New Scores Using Lyrics for Latent Rhythmic Structure

Authors: Callie C. Liao, Duoduo Liao, Jesse Guessford

Abstract: There has recently been a sharp increase in interest in Artificial Intelligence-Generated Content (AIGC). Despite this, musical components such as time signatures have not been studied sufficiently to form an algorithmic determination approach for new compositions, especially lyrical songs. This is likely because of the neglect of musical details, which is critical for constructing a robust framew… ▽ More There has recently been a sharp increase in interest in Artificial Intelligence-Generated Content (AIGC). Despite this, musical components such as time signatures have not been studied sufficiently to form an algorithmic determination approach for new compositions, especially lyrical songs. This is likely because of the neglect of musical details, which is critical for constructing a robust framework. Specifically, time signatures establish the fundamental rhythmic structure for almost all aspects of a song, including the phrases and notes. In this paper, we propose a novel approach that only uses lyrics as input to automatically generate a fitting time signature for lyrical songs and uncover the latent rhythmic structure utilizing explainable machine learning models. In particular, we devise multiple methods that are associated with discovering lyrical patterns and creating new features that simultaneously contain lyrical, rhythmic, and statistical information. In this approach, the best of our experimental results reveal a 97.6% F1 score and a 0.996 Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) score. In conclusion, our research directly generates time signatures from lyrics automatically for new scores utilizing machine learning, which is an innovative idea that approaches an understudied component of musicology and therefore contributes significantly to the future of Artificial Intelligence (AI) music generation. △ Less

Submitted 28 January, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted by 2023 IEEE International Conference on Big Data (IEEE BigData 2023)

arXiv:2311.13612 [pdf, other]

Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning

Authors: Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis

Abstract: Over the past year, a large body of multimodal research has emerged around zero-shot evaluation using GPT descriptors. These studies boost the zero-shot accuracy of pretrained VL models with an ensemble of label-specific text generated by GPT. A recent study, WaffleCLIP, demonstrated that similar zero-shot accuracy can be achieved with an ensemble of random descriptors. However, both zero-shot met… ▽ More Over the past year, a large body of multimodal research has emerged around zero-shot evaluation using GPT descriptors. These studies boost the zero-shot accuracy of pretrained VL models with an ensemble of label-specific text generated by GPT. A recent study, WaffleCLIP, demonstrated that similar zero-shot accuracy can be achieved with an ensemble of random descriptors. However, both zero-shot methods are un-trainable and consequently sub-optimal when some few-shot out-of-distribution (OOD) training data is available. Inspired by these prior works, we present two more flexible methods called descriptor and word soups, which do not require an LLM at test time and can leverage training data to increase OOD target accuracy. Descriptor soup greedily selects a small set of textual descriptors using generic few-shot training data, then calculates robust class embeddings using the selected descriptors. Word soup greedily assembles a chain of words in a similar manner. Compared to existing few-shot soft prompt tuning methods, word soup requires fewer parameters by construction and less GPU memory, since it does not require backpropagation. Both soups outperform current published few-shot methods, even when combined with SoTA zero-shot methods, on cross-dataset and domain generalization benchmarks. Compared with SoTA prompt and descriptor ensembling methods, such as ProDA and WaffleCLIP, word soup achieves higher OOD accuracy with fewer ensemble members. Please checkout our code: github.com/Chris210634/word_soups △ Less

Submitted 29 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12833 [pdf, other]

doi 10.1145/3624062.3624172

HPC-GPT: Integrating Large Language Model for High-Performance Computing

Authors: Xianzhong Ding, Le Chen, Murali Emani, Chunhua Liao, Pei-Hung Lin, Tristan Vanderbruggen, Zhen Xie, Alberto E. Cerpa, Wan Du

Abstract: Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, their performance in high-performance computing (HPC) domain tasks has been less than optimal due to the specialized expertise required to interpret the model responses. In response to this challenge, we propose HPC-GPT, a novel LLaM… ▽ More Large Language Models (LLMs), including the LLaMA model, have exhibited their efficacy across various general-domain natural language processing (NLP) tasks. However, their performance in high-performance computing (HPC) domain tasks has been less than optimal due to the specialized expertise required to interpret the model responses. In response to this challenge, we propose HPC-GPT, a novel LLaMA-based model that has been supervised fine-tuning using generated QA (Question-Answer) instances for the HPC domain. To evaluate its effectiveness, we concentrate on two HPC tasks: managing AI models and datasets for HPC, and data race detection. By employing HPC-GPT, we demonstrate comparable performance with existing methods on both tasks, exemplifying its excellence in HPC-related scenarios. Our experiments on open-source benchmarks yield extensive results, underscoring HPC-GPT's potential to bridge the performance gap between LLMs and HPC-specific tasks. With HPC-GPT, we aim to pave the way for LLMs to excel in HPC domains, simplifying the utilization of language models in complex computing applications. △ Less

Submitted 2 October, 2023; originally announced November 2023.

Comments: 9 pages

arXiv:2311.07989 [pdf, other]

Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code

Authors: Ziyin Zhang, Chaoyu Chen, Bingchang Liu, Cong Liao, Zi Gong, Hang Yu, Jianguo Li, Rui Wang

Abstract: In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP… ▽ More In this work we systematically review the recent advancements in software engineering with language models, covering 70+ models, 40+ evaluation tasks, 180+ datasets, and 900 related works. Unlike previous works, we integrate software engineering (SE) with natural language processing (NLP) by discussing the perspectives of both sides: SE applies language models for development automation, while NLP adopts SE tasks for language model evaluation. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also go beyond programming and review LLMs' application in other software engineering activities including requirement engineering, testing, deployment, and operations in an endeavor to provide a global view of NLP in SE, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on GitHub at https://github.com/codefuse-ai/Awesome-Code-LLM. △ Less

Submitted 26 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

Comments: Repo: https://github.com/codefuse-ai/Awesome-Code-LLM. 9 figures, 18 tables, and 902 references. Under review

arXiv:2311.02303 [pdf, other]

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li

Abstract: Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deploy… ▽ More Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder} △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.11478 [pdf, other]

ASP: Automatic Selection of Proxy dataset for efficient AutoML

Authors: Peng Yao, Chao Liao, Jiyuan Jia, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang

Abstract: Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the training time. In addition, a well-behaved model requires repeated trials of different structure designs and hyper-parameters, which may take a large amount of time… ▽ More Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs. However, it also brings a heavy computing burden as the amount of training data is proportional to the training time. In addition, a well-behaved model requires repeated trials of different structure designs and hyper-parameters, which may take a large amount of time even with state-of-the-art (SOTA) hyper-parameter optimization (HPO) algorithms and neural architecture search (NAS) algorithms. In this paper, we propose an Automatic Selection of Proxy dataset framework (ASP) aimed to dynamically find the informative proxy subsets of training data at each epoch, reducing the training data size as well as saving the AutoML processing time. We verify the effectiveness and generalization of ASP on CIFAR10, CIFAR100, ImageNet16-120, and ImageNet-1k, across various public model benchmarks. The experiment results show that ASP can obtain better results than other data selection methods at all selection ratios. ASP can also enable much more efficient AutoML processing with a speedup of 2x-20x while obtaining better architectures and better hyper-parameters compared to utilizing the entire dataset. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: This paper was actually finished in 2021

arXiv:2310.11117 [pdf, other]

USDC: Unified Static and Dynamic Compression for Visual Transformer

Authors: Huan Yuan, Chao Liao, Jianchao Tan, Peng Yao, Jiyuan Jia, Bin Chen, Chengru Song, Di Zhang

Abstract: Visual Transformers have achieved great success in almost all vision tasks, such as classification, detection, and so on. However, the model complexity and the inference speed of the visual transformers hinder their deployments in industrial products. Various model compression techniques focus on directly compressing the visual transformers into a smaller one while maintaining the model performanc… ▽ More Visual Transformers have achieved great success in almost all vision tasks, such as classification, detection, and so on. However, the model complexity and the inference speed of the visual transformers hinder their deployments in industrial products. Various model compression techniques focus on directly compressing the visual transformers into a smaller one while maintaining the model performance, however, the performance drops dramatically when the compression ratio is large. Furthermore, several dynamic network techniques have also been applied to dynamically compress the visual transformers to obtain input-adaptive efficient sub-structures during the inference stage, which can achieve a better trade-off between the compression ratio and the model performance. The upper bound of memory of dynamic models is not reduced in the practical deployment since the whole original visual transformer model and the additional control gating modules should be loaded onto devices together for inference. To alleviate two disadvantages of two categories of methods, we propose to unify the static compression and dynamic compression techniques jointly to obtain an input-adaptive compressed model, which can further better balance the total compression ratios and the model performances. Moreover, in practical deployment, the batch sizes of the training and inference stage are usually different, which will cause the model inference performance to be worse than the model training performance, which is not touched by all previous dynamic network papers. We propose a sub-group gates augmentation technique to solve this performance drop problem. Extensive experiments demonstrate the superiority of our method on various baseline visual transformers such as DeiT, T2T-ViT, and so on. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: This paper was actually finished in 2021

arXiv:2310.07488 [pdf, other]

KwaiYiiMath: Technical Report

Authors: Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, Shengnan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Lea… ▽ More Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively. △ Less

Submitted 19 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: technical report. arXiv admin note: text overlap with arXiv:2306.16636 by other authors

arXiv:2310.06266 [pdf, other]

doi 10.1145/3639477.3639719

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

Authors: Peng Di, Jianguo Li, Hang Yu, Wei Jiang, Wenting Cai, Yang Cao, Chaoyu Chen, Dajun Chen, Hongwei Chen, Liang Chen, Gang Fan, Jie Gong, Zi Gong, Wen Hu, Tingting Guo, Zhichao Lei, Ting Li, Zheng Li, Ming Liang, Cong Liao, Bingchang Liu, Jiachen Liu, Zhiwei Liu, Shaojun Lu, Min Shen , et al. (13 additional authors not shown)

Abstract: Code Large Language Models (Code LLMs) have gained significant attention in the industry due to their wide applications in the full lifecycle of software engineering. However, the effectiveness of existing models in understanding non-English inputs for multi-lingual code-related tasks is still far from well studied. This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM. It is sp… ▽ More Code Large Language Models (Code LLMs) have gained significant attention in the industry due to their wide applications in the full lifecycle of software engineering. However, the effectiveness of existing models in understanding non-English inputs for multi-lingual code-related tasks is still far from well studied. This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM. It is specifically designed for code-related tasks with both English and Chinese prompts and supports over 40 programming languages. CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset that is carefully filtered by program analyzers and optimized during the training process. Extensive experiments are conducted using real-world usage scenarios, the industry-standard benchmark HumanEval-x, and the specially designed CodeFuseEval for Chinese prompts. To assess the effectiveness of CodeFuse, we actively collected valuable human feedback from the AntGroup's software development process where CodeFuse has been successfully deployed. The results demonstrate that CodeFuse-13B achieves a HumanEval pass@1 score of 37.10%, positioning it as one of the top multi-lingual code LLMs with similar parameter sizes. In practical scenarios, such as code generation, code translation, code comments, and testcase generation, CodeFuse performs better than other models when confronted with Chinese prompts. △ Less

Submitted 10 January, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: Accepted by ICSE-SEIP 2024

arXiv:2310.05193 [pdf, other]

Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models

Authors: Chenzhuang Du, Yue Zhao, Chonghua Liao, Jiacheng You, Jie Fu, Hang Zhao

Abstract: This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal models in certain tasks. It's clear that their incorporation into multi-modal learning would significantly improve performance. However, multi-modal learning wi… ▽ More This paper investigates how to better leverage large-scale pre-trained uni-modal models to further enhance discriminative multi-modal learning. Even when fine-tuned with only uni-modal data, these models can outperform previous multi-modal models in certain tasks. It's clear that their incorporation into multi-modal learning would significantly improve performance. However, multi-modal learning with these models still suffers from insufficient learning of uni-modal features, which weakens the resulting multi-modal model's generalization ability. While fine-tuning uni-modal models separately and then aggregating their predictions is straightforward, it doesn't allow for adequate adaptation between modalities, also leading to sub-optimal results. To this end, we introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA). By freezing the weights of uni-modal fine-tuned models, adding extra trainable rank decomposition matrices to them, and subsequently performing multi-modal joint training, our method enhances adaptation between modalities and boosts overall performance. We demonstrate the effectiveness of MMLoRA on three dataset categories: audio-visual (e.g., AVE, Kinetics-Sound, CREMA-D), vision-language (e.g., MM-IMDB, UPMC Food101), and RGB-Optical Flow (UCF101). △ Less

Submitted 8 October, 2023; originally announced October 2023.

arXiv:2309.04669 [pdf, other]

Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization

Authors: Yang **, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu

Abstract: Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment… ▽ More Recently, the remarkable advance of the Large Language Model (LLM) has inspired researchers to transfer its extraordinary reasoning capability to both vision and language data. However, the prevailing approaches primarily regard the visual input as a prompt and focus exclusively on optimizing the text generation process conditioned upon vision content by a frozen LLM. Such an inequitable treatment of vision and language heavily constrains the model's potential. In this paper, we break through this limitation by representing both vision and language in a unified form. Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read. The resulting visual tokens encompass high-level semantics worthy of a word and also support dynamic sequence length varying from the image. Coped with this tokenizer, the presented foundation model called LaVIT can handle both image and text indiscriminately under the same generative learning paradigm. This unification empowers LaVIT to serve as an impressive generalist interface to understand and generate multi-modal content simultaneously. Extensive experiments further showcase that it outperforms the existing models by a large margin on massive vision-language tasks. Our code and models are available at https://github.com/jy0205/LaVIT. △ Less

Submitted 22 March, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: ICLR 2024

arXiv:2308.08649 [pdf, other]

Towards Zero Memory Footprint Spiking Neural Network Training

Authors: Bin Lei, Sheng Lin, Pei-Hung Lin, Chunhua Liao, Caiwen Ding

Abstract: Biologically-inspired Spiking Neural Networks (SNNs), processing information using discrete-time events known as spikes rather than continuous values, have garnered significant attention due to their hardware-friendly and energy-efficient characteristics. However, the training of SNNs necessitates a considerably large memory footprint, given the additional storage requirements for spikes or events… ▽ More Biologically-inspired Spiking Neural Networks (SNNs), processing information using discrete-time events known as spikes rather than continuous values, have garnered significant attention due to their hardware-friendly and energy-efficient characteristics. However, the training of SNNs necessitates a considerably large memory footprint, given the additional storage requirements for spikes or events, leading to a complex structure and dynamic setup. In this paper, to address memory constraint in SNN training, we introduce an innovative framework, characterized by a remarkably low memory footprint. We \textbf{(i)} design a reversible SNN node that retains a high level of accuracy. Our design is able to achieve a $\mathbf{58.65\times}$ reduction in memory usage compared to the current SNN node. We \textbf{(ii)} propose a unique algorithm to streamline the backpropagation process of our reversible SNN node. This significantly trims the backward Floating Point Operations Per Second (FLOPs), thereby accelerating the training process in comparison to current reversible layer backpropagation method. By using our algorithm, the training time is able to be curtailed by $\mathbf{23.8\%}$ relative to existing reversible layer architectures. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.08614 [pdf, other]

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

Authors: Bin Lei, pei-Hung Lin, Chunhua Liao, Caiwen Ding

Abstract: Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pi… ▽ More Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pioneering prompting technique, dubbed \textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating challenges: the 24-point game, resolution of high-degree polynomial equations, and derivation of formulas for recursive sequences, our method outperformed GPT-4, achieving accuracy improvements of $89.7\%$, $86\%$, and $56\%$ for each respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA) prompting method, \textit{Tree of Thought (ToT)}, our approach registered an average accuracy boost of $23\%$, $24\%$, and $15\%$. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.08473 [pdf, other]

DataRaceBench V1.4.1 and DataRaceBench-ML V0.1: Benchmark Suites for Data Race Detection

Authors: Le Chen, Wenhao Wu, Stephen F. Siegel, Pei-Hung Lin, Chunhua Liao

Abstract: Data races pose a significant threat in multi-threaded parallel applications due to their negative impact on program correctness. DataRaceBench, an open-source benchmark suite, is specifically crafted to assess these data race detection tools in a systematic and measurable manner. Machine learning techniques have recently demonstrated considerable potential in high-performance computing (HPC) prog… ▽ More Data races pose a significant threat in multi-threaded parallel applications due to their negative impact on program correctness. DataRaceBench, an open-source benchmark suite, is specifically crafted to assess these data race detection tools in a systematic and measurable manner. Machine learning techniques have recently demonstrated considerable potential in high-performance computing (HPC) program analysis and optimization. However, these techniques require specialized data formats for training and refinement. This paper presents the latest update to DataRaceBench, incorporating new data race contributions from Wu et al. \cite{wu2023model}, and introduces a derived dataset named DataRaceBench-ML (DRB-ML) \cite{drbml}. DRB-ML aligns with the emerging trend of machine learning and large language models. Originating from DataRaceBench, this dataset includes detailed labels that denote the presence of a data race and provides comprehensive details of associated variables, such as variable names, line numbers, and the operation (read/write). Unique to DRB-ML, we have also integrated a series of tailored prompt-response pairs specifically designed for LLM fine-tuning. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.07505 [pdf, other]

doi 10.1145/3624062.3624088

Data Race Detection Using Large Language Models

Authors: Le Chen, Xianzhong Ding, Murali Emani, Tristan Vanderbruggen, Pei-hung Lin, Chuanhua Liao

Abstract: Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated data… ▽ More Large language models (LLMs) are demonstrating significant promise as an alternate strategy to facilitate analyses and optimizations of high-performance computing programs, circumventing the need for resource-intensive manual tool creation. In this paper, we explore a novel LLM-based data race detection approach combining prompting engineering and fine-tuning techniques. We create a dedicated dataset named DRB-ML, which is derived from DataRaceBench, with fine-grain labels showing the presence of data race pairs and their associated variables, line numbers, and read/write information. DRB-ML is then used to evaluate representative LLMs and fine-tune open-source ones. Our experiment shows that LLMs can be a viable approach to data race detection. However, they still cannot compete with traditional data race detection tools when we need detailed information about variable pairs causing data races. △ Less

Submitted 3 October, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2307.07686 [pdf, other]

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C++

Authors: Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao

Abstract: In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source OpenMP benchmarks. It is also refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qu… ▽ More In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source OpenMP benchmarks. It is also refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We showcase how this dataset significantly elevates the translation competencies of large language models (LLMs). Specifically, models without prior coding knowledge experienced a boost of $\mathbf{\times~5.1}$ in their CodeBLEU scores, while models with some coding familiarity saw an impressive $\mathbf{\times~9.9}$-fold increase. The best fine-tuned model using our dataset outperforms GPT-4. It is also reaching human-level accuracy. This work underscores the immense potential of our dataset in propelling advancements in the domain of code translation for high-performance computing. The dataset is accessible at \href{https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset}{OpenMP-Fortran-CPP-Translation}. △ Less

Submitted 18 September, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

Comments: This paper was accepted by the HPEC 2023 conference and received the Outstanding Student Paper Award

arXiv:2306.16036 [pdf, other]

A Cascaded Approach for ultraly High Performance Lesion Detection and False Positive Removal in Liver CT Scans

Authors: Fakai Wang, Chi-Tung Cheng, Chien-Wei Peng, Ke Yan, Min Wu, Le Lu, Chien-Hung Liao, Ling Zhang

Abstract: Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and… ▽ More Liver cancer has high morbidity and mortality rates in the world. Multi-phase CT is a main medical imaging modality for detecting/identifying and diagnosing liver tumors. Automatically detecting and classifying liver lesions in CT images have the potential to improve the clinical workflow. This task remains challenging due to liver lesions' large variations in size, appearance, image contrast, and the complexities of tumor types or subtypes. In this work, we customize a multi-object labeling tool for multi-phase CT images, which is used to curate a large-scale dataset containing 1,631 patients with four-phase CT images, multi-organ masks, and multi-lesion (six major types of liver lesions confirmed by pathology) masks. We develop a two-stage liver lesion detection pipeline, where the high-sensitivity detecting algorithms in the first stage discover as many lesion proposals as possible, and the lesion-reclassification algorithms in the second stage remove as many false alarms as possible. The multi-sensitivity lesion detection algorithm maximizes the information utilization of the individual probability maps of segmentation, and the lesion-shuffle augmentation effectively explores the texture contrast between lesions and the liver. Independently tested on 331 patient cases, the proposed model achieves high sensitivity and specificity for malignancy classification in the multi-phase contrast-enhanced CT (99.2%, 97.1%, diagnosis setting) and in the noncontrast CT (97.3%, 95.7%, screening setting). △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.14979 [pdf, other]

doi 10.1007/978-3-031-40744-4_2

LM4HPC: Towards Effective Language Model Application in High-Performance Computing

Authors: Le Chen, Pei-Hung Lin, Tristan Vanderbruggen, Chunhua Liao, Murali Emani, Bronis de Supinski

Abstract: In recent years, language models (LMs), such as GPT-4, have been widely used in multiple domains, including natural language processing, visualization, and so on. However, applying them for analyzing and optimizing high-performance computing (HPC) software is still challenging due to the lack of HPC-specific support. In this paper, we design the LM4HPC framework to facilitate the research and deve… ▽ More In recent years, language models (LMs), such as GPT-4, have been widely used in multiple domains, including natural language processing, visualization, and so on. However, applying them for analyzing and optimizing high-performance computing (HPC) software is still challenging due to the lack of HPC-specific support. In this paper, we design the LM4HPC framework to facilitate the research and development of HPC software analyses and optimizations using LMs. Tailored for supporting HPC datasets, AI models, and pipelines, our framework is built on top of a range of components from different levels of the machine learning software stack, with Hugging Face-compatible APIs. Using three representative tasks, we evaluated the prototype of our framework. The results show that LM4HPC can help users quickly evaluate a set of state-of-the-art models and generate insightful leaderboards. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.10196 [pdf, other]

Structured Thoughts Automaton: First Formalized Execution Model for Auto-Regressive Language Models

Authors: Tristan Vanderbruggen, Chunhua Liao, Peter Pirkelbauer, Pei-Hung Lin

Abstract: In recent months, Language Models (LMs) have become a part of daily discourse, with focus on OpenAI and the potential of Artificial General Intelligence (AGI). Furthermore, the leaking of LLama's weights to the public has led to an influx of innovations demonstrating the impressive capabilities of generative LMs. While we believe that AGI is still a distant goal, we recognize the potential of LMs… ▽ More In recent months, Language Models (LMs) have become a part of daily discourse, with focus on OpenAI and the potential of Artificial General Intelligence (AGI). Furthermore, the leaking of LLama's weights to the public has led to an influx of innovations demonstrating the impressive capabilities of generative LMs. While we believe that AGI is still a distant goal, we recognize the potential of LMs in solving tasks such as searching complex documents, compiling reports with basic analysis, and providing assistance in problem-solving. In this paper, we propose formalizing the execution model of language models. We investigate current execution models, to find that this formalism has received little attention, and present our contribution: the first formalized execution model for LMs. We introduce a new algorithm for sampling the predictions of LMs, which we use to build a reliable and inspectable execution model. We introduce a low-level language to write "cognitive program" for this execution model. We hope to shed light on the need for execution models for LMs and encourage further research in this area. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: Submitted to CGO-24

arXiv:2306.08861 [pdf, other]

Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization

Authors: Makito Kobayashi, Chen-Chieh Liao, Keito Inoue, Sentaro Yojima, Masafumi Takahashi

Abstract: In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the… ▽ More In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the market. We conduct a comprehensive study on motion style transfer in the experiment using the state-of-the-art method, and the results show the proposed dataset's validity for the motion style transfer task. △ Less

Submitted 9 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.15843 [pdf, other]

TabGSL: Graph Structure Learning for Tabular Data Prediction

Authors: Jay Chiehen Liao, Cheng-Te Li

Abstract: This work presents a novel approach to tabular data prediction leveraging graph structure learning and graph neural networks. Despite the prevalence of tabular data in real-world applications, traditional deep learning methods often overlook the potentially valuable associations between data instances. Such associations can offer beneficial insights for classification tasks, as instances may exhib… ▽ More This work presents a novel approach to tabular data prediction leveraging graph structure learning and graph neural networks. Despite the prevalence of tabular data in real-world applications, traditional deep learning methods often overlook the potentially valuable associations between data instances. Such associations can offer beneficial insights for classification tasks, as instances may exhibit similar patterns of correlations among features and target labels. This information can be exploited by graph neural networks, necessitating robust graph structures. However, existing studies primarily focus on improving graph structure from noisy data, largely neglecting the possibility of deriving graph structures from tabular data. We present a novel solution, Tabular Graph Structure Learning (TabGSL), to enhance tabular data prediction by simultaneously learning instance correlation and feature interaction within a unified framework. This is achieved through a proposed graph contrastive learning module, along with transformer-based feature extractor and graph neural network. Comprehensive experiments conducted on 30 benchmark tabular datasets demonstrate that TabGSL markedly outperforms both tree-based models and recent deep learning-based tabular models. Visualizations of the learned instance embeddings further substantiate the effectiveness of TabGSL. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.07186 [pdf, other]

Learning to Code on Graphs for Topological Interference Management

Authors: Zhiwei Shan, ** Yi, Han Yu, Chung-Shou Liao, Shi **

Abstract: The state-of-the-art coding schemes for topological interference management (TIM) problems are usually handcrafted for specific families of network topologies, relying critically on experts' domain knowledge. This inevitably restricts the potential wider applications to wireless communication systems, due to the limited generalizability. This work makes the first attempt to advocate a novel intell… ▽ More The state-of-the-art coding schemes for topological interference management (TIM) problems are usually handcrafted for specific families of network topologies, relying critically on experts' domain knowledge. This inevitably restricts the potential wider applications to wireless communication systems, due to the limited generalizability. This work makes the first attempt to advocate a novel intelligent coding approach to mimic topological interference alignment (IA) via local graph coloring algorithms, leveraging the new advances of graph neural networks (GNNs) and reinforcement learning (RL). The proposed LCG framework is then generalized to discover new IA coding schemes, including one-to-one vector IA and subspace IA. The extensive experiments demonstrate the excellent generalizability and transferability of the proposed approach, where the parameterized GNNs trained by small size TIM instances are able to work well on new unseen network topologies with larger size. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: An extended version of a paper accepted by International Symposium on Information Theory (ISIT) 2023

arXiv:2304.07076 [pdf, other]

BCE-Net: Reliable Building Footprints Change Extraction based on Historical Map and Up-to-Date Images using Contrastive Learning

Authors: Cheng Liao, Han Hu, Xuekun Yuan, Haifeng Li, Chao Liu, Chunyang Liu, Gui Fu, Yulin Ding, Qing Zhu

Abstract: Automatic and periodic recompiling of building databases with up-to-date high-resolution images has become a critical requirement for rapidly develo** urban environments. However, the architecture of most existing approaches for change extraction attempts to learn features related to changes but ignores objectives related to buildings. This inevitably leads to the generation of significant pseud… ▽ More Automatic and periodic recompiling of building databases with up-to-date high-resolution images has become a critical requirement for rapidly develo** urban environments. However, the architecture of most existing approaches for change extraction attempts to learn features related to changes but ignores objectives related to buildings. This inevitably leads to the generation of significant pseudo-changes, due to factors such as seasonal changes in images and the inclination of building façades. To alleviate the above-mentioned problems, we developed a contrastive learning approach by validating historical building footprints against single up-to-date remotely sensed images. This contrastive learning strategy allowed us to inject the semantics of buildings into a pipeline for the detection of changes, which is achieved by increasing the distinguishability of features of buildings from those of non-buildings. In addition, to reduce the effects of inconsistencies between historical building polygons and buildings in up-to-date images, we employed a deformable convolutional neural network to learn offsets intuitively. In summary, we formulated a multi-branch building extraction method that identifies newly constructed and removed buildings, respectively. To validate our method, we conducted comparative experiments using the public Wuhan University building change detection dataset and a more practical dataset named SI-BU that we established. Our method achieved F1 scores of 93.99% and 70.74% on the above datasets, respectively. Moreover, when the data of the public dataset were divided in the same manner as in previous related studies, our method achieved an F1 score of 94.63%, which surpasses that of the state-of-the-art method. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.00474 [pdf, ps, other]

On the Optimal Recovery of Graph Signals

Authors: Simon Foucart, Chunyang Liao, Nate Veldt

Abstract: Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has s… ▽ More Learning a smooth graph signal from partially observed data is a well-studied task in graph-based machine learning. We consider this task from the perspective of optimal recovery, a mathematical framework for learning a function from observational data that adopts a worst-case perspective tied to model assumptions on the function to be learned. Earlier work in the optimal recovery literature has shown that minimizing a regularized objective produces optimal solutions for a general class of problems, but did not fully identify the regularization parameter. Our main contribution provides a way to compute regularization parameters that are optimal or near-optimal (depending on the setting), specifically for graph signal processing problems. Our results offer a new interpretation for classical optimization techniques in graph-based learning and also come with new insights for hyperparameter selection. We illustrate the potential of our methods in numerical experiments on several semi-synthetic graph signal processing datasets. △ Less

Submitted 29 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: This paper has been accepted by 14th International conference on Sampling Theory and Applications (SampTA 2023)

arXiv:2303.10674 [pdf]

URM4DMU: an user represention model for darknet markets users

Authors: Hongmeng Liu, Jiapeng Zhao, Yixuan Huo, Yuyan Wang, Chun Liao, Liyan Shen, Shiyao Cui, **qiao Shi

Abstract: Darknet markets provide a large platform for trading illicit goods and services due to their anonymity. Learning an invariant representation of each user based on their posts on different markets makes it easy to aggregate user information across different platforms, which helps identify anonymous users. Traditional user representation methods mainly rely on modeling the text information of posts… ▽ More Darknet markets provide a large platform for trading illicit goods and services due to their anonymity. Learning an invariant representation of each user based on their posts on different markets makes it easy to aggregate user information across different platforms, which helps identify anonymous users. Traditional user representation methods mainly rely on modeling the text information of posts and cannot capture the temporal content and the forum interaction of posts. While recent works mainly use CNN to model the text information of posts, failing to effectively model posts whose length changes frequently in an episode. To address the above problems, we propose a model named URM4DMU(User Representation Model for Darknet Markets Users) which mainly improves the post representation by augmenting convolutional operators and self-attention with an adaptive gate mechanism. It performs much better when combined with the temporal content and the forum interaction of posts. We demonstrate the effectiveness of URM4DMU on four darknet markets. The average improvements on MRR value and Recall@10 are 22.5% and 25.5% over the state-of-the-art method respectively. △ Less

Submitted 19 March, 2023; originally announced March 2023.

Comments: 9pages

MSC Class: 62 (Primary); 54 (Secondary) ACM Class: I.2.7

arXiv:2303.08873 [pdf, other]

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems

Authors: Giorgis Georgakoudis, Konstantinos Parasyris, Chunhua Liao, David Beckingsale, Todd Gamblin, Bronis de Supinski

Abstract: Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. To address those challenges, this pa… ▽ More Heterogeneity has become a mainstream architecture design choice for building High Performance Computing systems. However, heterogeneity poses significant challenges for achieving performance portability of execution. Adapting a program to a new heterogeneous platform is laborious and requires developers to manually explore a vast space of execution parameters. To address those challenges, this paper proposes new extensions to OpenMP for autonomous, machine learning-driven adaptation. Our solution includes a set of novel language constructs, compiler transformations, and runtime support. We propose a producer-consumer pattern to flexibly define multiple, different variants of OpenMP code regions to enable adaptation. Those regions are transparently profiled at runtime to autonomously learn optimizing machine learning models that dynamically select the fastest variant. Our approach significantly reduces users' efforts of programming adaptive applications on heterogeneous architectures by leveraging machine learning techniques and code generation capabilities of OpenMP compilation. Using a complete reference implementation in Clang/LLVM we evaluate three use-cases of adaptive CPU-GPU execution. Experiments with HPC proxy applications and benchmarks demonstrate that the proposed adaptive OpenMP extensions automatically choose the best performing code variants for various adaptation possibilities, in several different heterogeneous platforms of CPUs and GPUs. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Report number: LLNL-CONF-833682

arXiv:2303.06292 [pdf, other]

Multi-view shaker detection: Insights from a noise-immune influence analysis Perspective

Authors: Chang Liao

Abstract: Entities whose changes will significantly affect others in a networked system are called shakers. In recent years, some models have been proposed to detect such shaker from evolving entities. However, limited work has focused on shaker detection in very short term, which has many real-world applications. For example, in financial market, it can enable both investors and governors to quickly respon… ▽ More Entities whose changes will significantly affect others in a networked system are called shakers. In recent years, some models have been proposed to detect such shaker from evolving entities. However, limited work has focused on shaker detection in very short term, which has many real-world applications. For example, in financial market, it can enable both investors and governors to quickly respond to rapid changes. Under the short-term setting, conventional methods may suffer from limited data sample problems and are sensitive to cynical manipulations, leading to unreliable results. Fortunately, there are multi-attribute evolution records available, which can provide compatible and complementary information. In this paper, we investigate how to learn reliable influence results from the short-term multi-attribute evolution records. We call entities with consistent influence among different views in short term as multi-view shakers and study the new problem of multi-view shaker detection. We identify the challenges as follows: (1) how to jointly detect short-term shakers and model conflicting influence results among different views? (2) how to filter spurious influence relation in each individual view for robust influence inference? In response, a novel solution, called Robust Influence Network from a noise-immune influence analysis perspective is proposed, where the possible outliers are well modelled jointly with multi-view shaker detection task. More specifically, we learn the influence relation from each view and transform influence relation from different views into an intermediate representation. In the meantime, we uncover both the inconsistent and spurious outliers. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 14 pages, 4 figures

ACM Class: J.4

arXiv:2303.06284 [pdf, other]

Prospecting Community Development Strength based on Economic Graph: From Categorization to Scoring

Authors: Chang Liao

Abstract: Recent years have witnessed a growing number of researches on community characterization. In contrast to the large body of researches on the categorical measures (rise or decline) for evaluating the community development, we propose to estimate the community development strength (to which degree the rise or decline is). More specifically, given already known categorical information of community de… ▽ More Recent years have witnessed a growing number of researches on community characterization. In contrast to the large body of researches on the categorical measures (rise or decline) for evaluating the community development, we propose to estimate the community development strength (to which degree the rise or decline is). More specifically, given already known categorical information of community development, we are attempting to quantify the community development strength, which is of great interest. Motivated by the increasing availability of large-scale data on the network between entities among communities, we investigate how to score the the community's development strength. We formally define our task as prospecting community development strength from categorization based on multi-relational network information and identify two challenges as follows: (1) limited guidance for integrating entity multi-relational network in quantifying the community development strength; (2) the existence of selection effect that the community development strength has on network formation. Aiming at these challenges, we start by a hybrid of discriminative and generative approaches on multi-relational network-based community development strength quantification. Then a network generation process is exploited to debias the selection process. In the end, we empirically evaluate the proposed model by applying it to quantify enterprise business development strength. Experimental results demonstrate the effectiveness of the proposed method. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 12 pages, 3 figures

ACM Class: J.4

arXiv:2302.06857 [pdf, other]

doi 10.1145/3577190.3614106

Make Your Brief Stroke Real and Stereoscopic: 3D-Aware Simplified Sketch to Portrait Generation

Authors: Yasheng Sun, Qianyi Wu, Hang Zhou, Kaisiyuan Wang, Tianshu Hu, Chen-Chieh Liao, Shio Miyafuji, Ziwei Liu, Hideki Koike

Abstract: Creating the photo-realistic version of people sketched portraits is useful to various entertainment purposes. Existing studies only generate portraits in the 2D plane with fixed views, making the results less vivid. In this paper, we present Stereoscopic Simplified Sketch-to-Portrait (SSSP), which explores the possibility of creating Stereoscopic 3D-aware portraits from simple contour sketches by… ▽ More Creating the photo-realistic version of people sketched portraits is useful to various entertainment purposes. Existing studies only generate portraits in the 2D plane with fixed views, making the results less vivid. In this paper, we present Stereoscopic Simplified Sketch-to-Portrait (SSSP), which explores the possibility of creating Stereoscopic 3D-aware portraits from simple contour sketches by involving 3D generative models. Our key insight is to design sketch-aware constraints that can fully exploit the prior knowledge of a tri-plane-based 3D-aware generative model. Specifically, our designed region-aware volume rendering strategy and global consistency constraint further enhance detail correspondences during sketch encoding. Moreover, in order to facilitate the usage of layman users, we propose a Contour-to-Sketch module with vector quantized representations, so that easily drawn contours can directly guide the generation of 3D portraits. Extensive comparisons show that our method generates high-quality results that match the sketch. Our usability study verifies that our system is greatly preferred by user. △ Less

Submitted 1 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: Project Page on https://hangz-nju-cuhk.github.io/projects/SSSP, Video Url: https://youtu.be/GiOKbvr2U_E

arXiv:2301.05991 [pdf]

Conceptual Framework and Documentation Standards of Cystoscopic Media Content for Artificial Intelligence

Authors: Okyaz Eminaga, Timothy Jiyong Lee, Jessie Ge, Eugene Shkolyar, Mark Laurie, ** Long, Lukas Graham Hockman, Joseph C. Liao

Abstract: Background: The clinical documentation of cystoscopy includes visual and textual materials. However, the secondary use of visual cystoscopic data for educational and research purposes remains limited due to inefficient data management in routine clinical practice. Methods: A conceptual framework was designed to document cystoscopy in a standardized manner with three major sections: data management… ▽ More Background: The clinical documentation of cystoscopy includes visual and textual materials. However, the secondary use of visual cystoscopic data for educational and research purposes remains limited due to inefficient data management in routine clinical practice. Methods: A conceptual framework was designed to document cystoscopy in a standardized manner with three major sections: data management, annotation management, and utilization management. A Swiss-cheese model was proposed for quality control and root cause analyses. We defined the infrastructure required to implement the framework with respect to FAIR (findable, accessible, interoperable, re-usable) principles. We applied two scenarios exemplifying data sharing for research and educational projects to ensure the compliance with FAIR principles. Results: The framework was successfully implemented while following FAIR principles. The cystoscopy atlas produced from the framework could be presented in an educational web portal; a total of 68 full-length qualitative videos and corresponding annotation data were sharable for artificial intelligence projects covering frame classification and segmentation problems at case, lesion and frame levels. Conclusion: Our study shows that the proposed framework facilitates the storage of the visual documentation in a standardized manner and enables FAIR data for education and artificial intelligence research. △ Less

Submitted 18 January, 2023; v1 submitted 14 January, 2023; originally announced January 2023.

Comments: Under Reveiw

arXiv:2301.05897 [pdf]

doi 10.1117/12.2644087

Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy

Authors: Erik Isai Valle Salgado, Haoxin Yan, Yue Hong, Peiyuan Zhu, Shidong Zhu, Chengwei Liao, Yanxiang Wen, Xiu Li, Xiang Qian, Xiaohao Wang, Xinghui Li

Abstract: Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate th… ▽ More Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples. This research applies model-based TL via domain similarity to improve the overall performance and data augmentation in both target and source domains to enrich the data quality and reduce the imbalance. Given a group of source datasets from similar industrial processes, we define which group is the most related to the target through the domain discrepancy score and the number of samples each has. Then, we transfer the chosen pre-trained backbone weights to train and fine-tune the target network. Our research suggests increases in the F1 score and the PR curve up to 20% compared with TL using benchmark datasets. △ Less

Submitted 14 January, 2023; originally announced January 2023.

Comments: This is a fix of the published paper "Relational-based transfer learning for automatic optical inspection based on domain discrepancy"

Journal ref: Proc. SPIE 12317, Optoelectronic Imaging and Multimedia Technology IXMultimedia Technology IX, 2023

arXiv:2301.02732 [pdf]

doi 10.1109/BigData55660.2022.10021009

Multimodal Lyrics-Rhythm Matching

Authors: Callie C. Liao, Duoduo Liao, Jesse Guessford

Abstract: Despite the recent increase in research on artificial intelligence for music, prominent correlations between key components of lyrics and rhythm such as keywords, stressed syllables, and strong beats are not frequently studied. This is likely due to challenges such as audio misalignment, inaccuracies in syllabic identification, and most importantly, the need for cross-disciplinary knowledge. To ad… ▽ More Despite the recent increase in research on artificial intelligence for music, prominent correlations between key components of lyrics and rhythm such as keywords, stressed syllables, and strong beats are not frequently studied. This is likely due to challenges such as audio misalignment, inaccuracies in syllabic identification, and most importantly, the need for cross-disciplinary knowledge. To address this lack of research, we propose a novel multimodal lyrics-rhythm matching approach in this paper that specifically matches key components of lyrics and music with each other without any language limitations. We use audio instead of sheet music with readily available metadata, which creates more challenges yet increases the application flexibility of our method. Furthermore, our approach creatively generates several patterns involving various multimodalities, including music strong beats, lyrical syllables, auditory changes in a singer's pronunciation, and especially lyrical keywords, which are utilized for matching key lyrical elements with key rhythmic elements. This advantageous approach not only provides a unique way to study auditory lyrics-rhythm correlations including efficient rhythm-based audio alignment algorithms, but also bridges computational linguistics with music as well as music cognition. Our experimental results reveal an 0.81 probability of matching on average, and around 30% of the songs have a probability of 0.9 or higher of keywords landing on strong beats, including 12% of the songs with a perfect landing. Also, the similarity metrics are used to evaluate the correlation between lyrics and rhythm. It shows that nearly 50% of the songs have 0.70 similarity or higher. In conclusion, our approach contributes significantly to the lyrics-rhythm relationship by computationally unveiling insightful correlations. △ Less

Submitted 14 March, 2023; v1 submitted 6 January, 2023; originally announced January 2023.

Comments: Accepted by 2022 IEEE International Conference on Big Data (IEEE Big Data 2022)

arXiv:2212.06352 [pdf, other]

Towards Seamless Management of AI Models in High-Performance Computing

Authors: Sixing Yu, Murali Emani, Chunhua Liao, Pei-Hung Lin, Tristan Vanderbruggen, Xipeng Shen, Ali Jannesari

Abstract: With the increasing prevalence of artificial intelligence (AI) in diverse science/engineering communities, AI models emerge on an unprecedented scale among various domains. However, given the complexity and diversity of the software and hardware environments, reusing AI artifacts (models and datasets) is extremely challenging, especially with AI-driven science applications. Building an ecosystem t… ▽ More With the increasing prevalence of artificial intelligence (AI) in diverse science/engineering communities, AI models emerge on an unprecedented scale among various domains. However, given the complexity and diversity of the software and hardware environments, reusing AI artifacts (models and datasets) is extremely challenging, especially with AI-driven science applications. Building an ecosystem to run and reuse AI applications/datasets at scale efficiently becomes increasingly essential for diverse science and engineering and high-performance computing (HPC) communities. In this paper, we innovate over an HPC-AI ecosystem -- HPCFair, which enables the Findable, Accessible, Interoperable, and Reproducible (FAIR) principles. HPCFair enables the collection of AI models/datasets allowing users to download/upload AI artifacts with authentications. Most importantly, our proposed framework provides user-friendly APIs for users to easily run inference jobs and customize AI artifacts to their tasks as needed. Our results show that, with HPCFair API, users irrespective of technical expertise in AI, can easily leverage AI artifacts to their tasks with minimal effort. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: Accepted at the 2nd Annual AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

arXiv:2211.04668 [pdf, other]

Zero-Label Prompt Selection

Authors: Chonghua Liao, Yanan Zheng, Zhilin Yang

Abstract: Natural language prompts have been shown to facilitate cross-task generalization for large language models. However, with no or limited labeled examples, the cross-task performance is highly sensitive to the choice of prompts, while selecting a high-performing prompt is challenging given the scarcity of labels. To address the issue, we propose a Zero-Label Prompt Selection (ZPS) method that select… ▽ More Natural language prompts have been shown to facilitate cross-task generalization for large language models. However, with no or limited labeled examples, the cross-task performance is highly sensitive to the choice of prompts, while selecting a high-performing prompt is challenging given the scarcity of labels. To address the issue, we propose a Zero-Label Prompt Selection (ZPS) method that selects prompts without any labeled data or gradient update. Specifically, given the candidate human-written prompts for a task, ZPS labels a set of unlabeled data with a prompt ensemble and uses the pseudo-labels for prompt selection. Experiments show that ZPS improves over prior methods by a sizeable margin in zero-label performance. We also extend ZPS to a few-shot setting and show its advantages over strong baselines such as prompt tuning and model tuning. △ Less

Submitted 8 November, 2022; originally announced November 2022.

arXiv:2211.02092 [pdf, other]

Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study

Authors: Pei-Hung Lin, Chunhua Liao, Winson Chen, Tristan Vanderbruggen, Murali Emani, Hailu Xu

Abstract: The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. However, these principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing (HPC). In this paper, we design a methodology to make HP… ▽ More The FAIR Guiding Principles aim to improve the findability, accessibility, interoperability, and reusability of digital content by making them both human and machine actionable. However, these principles have not yet been broadly adopted in the domain of machine learning-based program analyses and optimizations for High-Performance Computing (HPC). In this paper, we design a methodology to make HPC datasets and machine learning models FAIR after investigating existing FAIRness assessment and improvement techniques. Our methodology includes a comprehensive, quantitative assessment for elected data, followed by concrete, actionable suggestions to improve FAIRness with respect to common issues related to persistent identifiers, rich metadata descriptions, license and provenance information. Moreover, we select a representative training dataset to evaluate our methodology. The experiment shows the methodology can effectively improve the dataset and model's FAIRness from an initial score of 19.1% to the final score of 83.0%. △ Less

Submitted 3 November, 2022; originally announced November 2022.

arXiv:2210.01908 [pdf, other]

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

Authors: Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis

Abstract: There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addi… ▽ More There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. Our contextual loss implicitly enforces semantic consistency among neighbors while converging to the correct ranking. We empirically show that the proposed loss is more robust to label noise, and is less prone to overfitting even when a large portion of train data is withheld. Extensive experiments demonstrate that our method achieves a new state-of-the-art across four image retrieval benchmarks and multiple different evaluation settings. Code is available at: https://github.com/Chris210634/metric-learning-using-contextual-similarity △ Less

Submitted 2 June, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2209.12021 [pdf, other]

Improving the Bounds of the Online Dynamic Power Management Problem

Authors: Ya-Chun Liang, Kazuo Iwama, Chung-Shou Liao

Abstract: We investigate the {\em power-down mechanism} which decides when a machine transitions between states such that the total energy consumption, characterized by execution cost, idle cost and switching cost, is minimized. In contrast to most of the previous studies on the offline model, we focus on the online model in which a sequence of jobs with their release time, execution time and deadline, arri… ▽ More We investigate the {\em power-down mechanism} which decides when a machine transitions between states such that the total energy consumption, characterized by execution cost, idle cost and switching cost, is minimized. In contrast to most of the previous studies on the offline model, we focus on the online model in which a sequence of jobs with their release time, execution time and deadline, arrive in an online fashion. More precisely, we exploit a different switching on and off strategy and present an upper bound of 3, and further show a lower bound of 2.1, in a dual-machine model, introduced by Chen et al. in 2014 [STACS 2014: 226-238], both of which beat the currently best result. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2208.05596 [pdf, other]

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines

Authors: Patrick Flynn, Tristan Vanderbruggen, Chunhua Liao, Pei-Hung Lin, Murali Emani, Xipeng Shen

Abstract: Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets a… ▽ More Programming Language Processing (PLP) using machine learning has made vast improvements in the past few years. Increasingly more people are interested in exploring this promising field. However, it is challenging for new researchers and developers to find the right components to construct their own machine learning pipelines, given the diverse PLP tasks to be solved, the large number of datasets and models being released, and the set of complex compilers or tools involved. To improve the findability, accessibility, interoperability and reusability (FAIRness) of machine learning components, we collect and analyze a set of representative papers in the domain of machine learning-based PLP. We then identify and characterize key concepts including PLP tasks, model architectures and supportive tools. Finally, we show some example use cases of leveraging the reusable components to construct machine learning pipelines to solve a set of PLP tasks. △ Less

Submitted 15 June, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.03970 [pdf, ps, other]

Optimized Design for IRS-Assisted Integrated Sensing and Communication Systems in Clutter Environments

Authors: Chikun Liao, Feng Wang, Vincent K. N. Lau

Abstract: In this paper, we investigate an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system design in a clutter environment. Assisted by an IRS equipped with a uniform linear array (ULA), a multi-antenna base station (BS) is targeted for communicating with multiple communication users (CUs) and sensing multiple targets simultaneously. We consider the IRS-assis… ▽ More In this paper, we investigate an intelligent reflecting surface (IRS)-assisted integrated sensing and communication (ISAC) system design in a clutter environment. Assisted by an IRS equipped with a uniform linear array (ULA), a multi-antenna base station (BS) is targeted for communicating with multiple communication users (CUs) and sensing multiple targets simultaneously. We consider the IRS-assisted ISAC design in the case with Type-I or Type-II CUs, where each Type-I and Type-II CU can and cannot cancel the interference from sensing signals, respectively. In particular, we aim to maximize the minimum sensing beampattern gain among multiple targets, by jointly optimizing the BS transmit beamforming vectors and the IRS phase shifting matrix, subject to the signal-to-interference-plus-noise ratio (SINR) constraint for each Type-I/Type-II CU, the interference power constraint per clutter, the transmission power constraint at the BS, and the cross-correlation pattern constraint. Due to the coupling of the BS's transmit design variables and the IRS's phase shifting matrix, the formulated max-min IRS-assisted ISAC design problem in the case with Type-I/Type-II CUs is highly non-convex. As such, we propose an efficient algorithm based on the alternating-optimization and semi-definite relaxation (SDR) techniques. In the case with Type-I CUs, we show that the dedicated sensing signal at the BS is always beneficial to improve the sensing performance. By contrast, the dedicated sensing signal at the BS is not required in the case with Type-II CUs. Numerical results are provided to show that the proposed IRS-assisted ISAC design schemes achieve a significant gain over the existing benchmark schemes. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 28 pages, 9 figures, single-column full paper

arXiv:2206.15364 [pdf, other]

Online TSP with Predictions

Authors: Hsiao-Yu Hu, Hao-Ting Wei, Meng-Hsi Li, Kai-Min Chung, Chung-Shou Liao

Abstract: We initiate the study of online routing problems with predictions, inspired by recent exciting results in the area of learning-augmented algorithms. A learning-augmented online algorithm which incorporates predictions in a black-box manner to outperform existing algorithms if the predictions are accurate while otherwise maintaining theoretical guarantees even when the predictions are extremely err… ▽ More We initiate the study of online routing problems with predictions, inspired by recent exciting results in the area of learning-augmented algorithms. A learning-augmented online algorithm which incorporates predictions in a black-box manner to outperform existing algorithms if the predictions are accurate while otherwise maintaining theoretical guarantees even when the predictions are extremely erroneous is a popular framework for overcoming pessimistic worst-case competitive analysis. In this study, we particularly begin investigating the classical online traveling salesman problem (OLTSP), where future requests are augmented with predictions. Unlike the prediction models in other previous studies, each actual request in the OLTSP, associated with its arrival time and position, may not coincide with the predicted ones, which, as imagined, leads to a troublesome situation. Our main result is to study different prediction models and design algorithms to improve the best-known results in the different settings. Moreover, we generalize the proposed results to the online dial-a-ride problem. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Showing 1–50 of 114 results for author: Liao, C