Search | arXiv e-print repository

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Authors: Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach… ▽ More Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to $4\times70$ billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of $2.0-10.6\times$ compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of $26\%$ performance improvement over heuristic approaches based on Megatron-LM. The source code of ReaLHF is publicly available at https://github.com/openpsi-project/ReaLHF . △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 13 pages (15 pages with references), 13 figures

arXiv:2406.05707 [pdf, other]

QGEval: A Benchmark for Question Generation Evaluation

Authors: Wei** Fu, Bifan Wei, Jianxiang Hu, Zhongmin Cai, Jun Liu

Abstract: Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified… ▽ More Automatically generated questions often suffer from problems such as unclear expression or factual inaccuracies, requiring a reliable and comprehensive evaluation of their quality. Human evaluation is frequently used in the field of question generation (QG) and is one of the most accurate evaluation methods. It also serves as the standard for automatic metrics. However, there is a lack of unified evaluation criteria, which hampers the development of both QG technologies and automatic evaluation methods. To address this, we propose QGEval, a multi-dimensional Evaluation benchmark for Question Generation, which evaluates both generated questions and existing automatic metrics across 7 dimensions: fluency, clarity, conciseness, relevance, consistency, answerability, and answer consistency. We demonstrate the appropriateness of these dimensions by examining their correlations and distinctions. Analysis with QGEval reveals that 1) most QG models perform unsatisfactorily in terms of answerability and answer consistency, and 2) existing metrics fail to align well with human assessments when evaluating generated questions across the 7 dimensions. We expect this work to foster the development of both QG technologies and automatic metrics for QG. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.03065 [pdf, other]

Decision Boundary-aware Knowledge Consolidation Generates Better Instance-Incremental Learner

Authors: Qiang Nie, Weifu Fu, Yuhuan Lin, Jialin Li, Yifeng Zhou, Yong Liu, Lei Zhu, Chengjie Wang

Abstract: Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with t… ▽ More Instance-incremental learning (IIL) focuses on learning continually with data of the same classes. Compared to class-incremental learning (CIL), the IIL is seldom explored because IIL suffers less from catastrophic forgetting (CF). However, besides retaining knowledge, in real-world deployment scenarios where the class space is always predefined, continual and cost-effective model promotion with the potential unavailability of previous data is a more essential demand. Therefore, we first define a new and more practical IIL setting as promoting the model's performance besides resisting CF with only new observations. Two issues have to be tackled in the new IIL setting: 1) the notorious catastrophic forgetting because of no access to old data, and 2) broadening the existing decision boundary to new observations because of concept drift. To tackle these problems, our key insight is to moderately broaden the decision boundary to fail cases while retain old boundary. Hence, we propose a novel decision boundary-aware distillation method with consolidating knowledge to teacher to ease the student learning new knowledge. We also establish the benchmarks on existing datasets Cifar-100 and ImageNet. Notably, extensive experiments demonstrate that the teacher model can be a better incremental learner than the student model, which overturns previous knowledge distillation-based methods treating student as the main role. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 14 pages

arXiv:2404.10719 [pdf, other]

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Authors: Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal… ▽ More Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal Policy Optimization (PPO). However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO). Is DPO truly superior to PPO? Why does PPO perform poorly on these benchmarks? In this paper, we first conduct both theoretical and empirical studies on the algorithmic properties of DPO and show that DPO may have fundamental limitations. Moreover, we also comprehensively examine PPO and reveal the key factors for the best performances of PPO in fine-tuning LLMs. Finally, we benchmark DPO and PPO across a collection of RLHF testbeds, ranging from dialogue to code generation. Experiment results demonstrate that PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions. △ Less

Submitted 21 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 16 pages, 2 figures, 14 tables

arXiv:2403.17980 [pdf, other]

EG-ConMix: An Intrusion Detection Method based on Graph Contrastive Learning

Authors: Li** Wu, Shanshan Lei, Feilong Liao, Yuanjun Zheng, Yuxin Liu, Wentao Fu, Hao Song, Jiajun Zhou

Abstract: As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without conside… ▽ More As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without considering the interrelationships among them, thus limiting the monitoring of complex IoT network attack events. Besides, anomalous traffic in real networks accounts for only a small fraction, which leads to a severe imbalance problem in the dataset that makes algorithmic learning and prediction extremely difficult. In this paper, we propose an EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance. In addition, we incorporate contrastive learning to discern the difference between normal and malicious traffic samples, facilitating the extraction of key features. Extensive experiments on two publicly available datasets demonstrate the superior intrusion detection performance of EG-ConMix compared to state-of-the-art methods. Remarkably, it exhibits significant advantages in terms of training speed and accuracy for large-scale graphs. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.05500 [pdf, other]

Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors

Authors: Julia Di, Zdravko Dugonjic, Will Fu, Tingfan Wu, Romeo Mercado, Kevin Sawyer, Victoria Rose Most, Gregg Kammerer, Stefanie Speidel, Richard E. Fan, Geoffrey Sonn, Mark R. Cutkosky, Mike Lambeta, Roberto Calandra

Abstract: Vision-based tactile sensors have recently become popular due to their combination of low cost, very high spatial resolution, and ease of integration using widely available miniature cameras. The associated field of view and focal length, however, are difficult to package in a human-sized finger. In this paper we employ optical fiber bundles to achieve a form factor that, at 15 mm diameter, is sma… ▽ More Vision-based tactile sensors have recently become popular due to their combination of low cost, very high spatial resolution, and ease of integration using widely available miniature cameras. The associated field of view and focal length, however, are difficult to package in a human-sized finger. In this paper we employ optical fiber bundles to achieve a form factor that, at 15 mm diameter, is smaller than an average human fingertip. The electronics and camera are also located remotely, further reducing package size. The sensor achieves a spatial resolution of 0.22 mm and a minimum force resolution 5 mN for normal and shear contact forces. With these attributes, the DIGIT Pinki sensor is suitable for applications such as robotic and teleoperated digital palpation. We demonstrate its utility for palpation of the prostate gland and show that it can achieve clinically relevant discrimination of prostate stiffness for phantom and ex vivo tissue. △ Less

Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: We open source the design of DIGIT Pinki at https://github.com/facebookresearch/digit-design

arXiv:2403.04303 [pdf, other]

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Authors: Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wang

Abstract: Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even rea… ▽ More Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even reach dozens, further exacerbating this issue. To mitigate this problem, we introduce LORS (LOw-rank Residual Structure). LORS allows stacked modules to share the majority of parameters, requiring a much smaller number of unique ones per module to match or even surpass the performance of using entirely distinct ones, thereby significantly reducing parameter usage. We validate our method by applying it to the stacked decoders of a query-based object detector, and conduct extensive experiments on the widely used MS COCO dataset. Experimental results demonstrate the effectiveness of our method, as even with a 70\% reduction in the parameters of the decoder, our method still enables the model to achieve comparable or △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 9 pages, 5 figures, 11 tables, CVPR2024 accepted

arXiv:2403.01652 [pdf, other]

Towards Memory-Efficient Traffic Policing in Time-Sensitive Networking

Authors: Xuyan Jiang, Xiangrui Yang, Tongqing Zhou, Wenwen Fu, Wei Quan, Yihao Jiao, Yinhan Sun, Zhigang Sun

Abstract: Time-Sensitive Networking (TSN) is an emerging real-time Ethernet technology that provides deterministic communication for time-critical traffic. At its core, TSN relies on Time-Aware Shaper (TAS) for pre-allocating frames in specific time intervals and Per-Stream Filtering and Policing (PSFP) for mitigating the fatal disturbance of unavoidable frame drift. However, as first identified in this wor… ▽ More Time-Sensitive Networking (TSN) is an emerging real-time Ethernet technology that provides deterministic communication for time-critical traffic. At its core, TSN relies on Time-Aware Shaper (TAS) for pre-allocating frames in specific time intervals and Per-Stream Filtering and Policing (PSFP) for mitigating the fatal disturbance of unavoidable frame drift. However, as first identified in this work, PSFP incurs heavy memory consumption during policing, hindering normal switching functionalities. This work proposes a lightweight policing design called FooDog, which could facilitate sub-microsecond jitter with ultra-low memory consumption. FooDog employs a period-wise and stream-wise structure to realize the memory-efficient PSFP without loss of determinism. Results using commercial FPGAs in typical aerospace scenarios show that FooDog could keep end-to-end time-sensitive traffic jitter <150 nanoseconds in the presence of abnormal traffic, comparable to typical TSN performance without anomalies. Meanwhile, it consumes merely hundreds of kilobits of memory, reducing >90% of on-chip memory overheads than unoptimized PSFP design. △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.11954 [pdf, other]

Multimodal Emotion Recognition from Raw Audio with Sinc-convolution

Authors: Xiaohui Zhang, Wenjie Fu, Mangui Liang

Abstract: Speech Emotion Recognition (SER) is still a complex task for computers with average recall rates usually about 70% on the most realistic datasets. Most SER systems use hand-crafted features extracted from audio signal such as energy, zero crossing rate, spectral information, prosodic, mel frequency cepstral coefficient (MFCC), and so on. More recently, using raw waveform for training neural networ… ▽ More Speech Emotion Recognition (SER) is still a complex task for computers with average recall rates usually about 70% on the most realistic datasets. Most SER systems use hand-crafted features extracted from audio signal such as energy, zero crossing rate, spectral information, prosodic, mel frequency cepstral coefficient (MFCC), and so on. More recently, using raw waveform for training neural network is becoming an emerging trend. This approach is advantageous as it eliminates the feature extraction pipeline. Learning from time-domain signal has shown good results for tasks such as speech recognition, speaker verification etc. In this paper, we utilize Sinc-convolution layer, which is an efficient architecture for preprocessing raw speech waveform for emotion recognition, to extract acoustic features from raw audio signals followed by a long short-term memory (LSTM). We also incorporate linguistic features and append a dialogical emotion decoding (DED) strategy. Our approach achieves a weighted accuracy of 85.1\% in four class emotion on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11931 [pdf, other]

Soft-Weighted CrossEntropy Loss for Continous Alzheimer's Disease Detection

Authors: Xiaohui Zhang, Wenjie Fu, Mangui Liang

Abstract: Alzheimer's disease is a common cognitive disorder in the elderly. Early and accurate diagnosis of Alzheimer's disease (AD) has a major impact on the progress of research on dementia. At present, researchers have used machine learning methods to detect Alzheimer's disease from the speech of participants. However, the recognition accuracy of current methods is unsatisfactory, and most of them focus… ▽ More Alzheimer's disease is a common cognitive disorder in the elderly. Early and accurate diagnosis of Alzheimer's disease (AD) has a major impact on the progress of research on dementia. At present, researchers have used machine learning methods to detect Alzheimer's disease from the speech of participants. However, the recognition accuracy of current methods is unsatisfactory, and most of them focus on using low-dimensional handcrafted features to extract relevant information from audios. This paper proposes an Alzheimer's disease detection system based on the pre-trained framework Wav2vec 2.0 (Wav2vec2). In addition, by replacing the loss function with the Soft-Weighted CrossEntropy loss function, we achieved 85.45\% recognition accuracy on the same test dataset. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.02146 [pdf, other]

Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning

Authors: Weiqi Fu, Lianming Xu, Xin Wu, Li Wang, Aiguo Fei

Abstract: In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emer… ▽ More In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emergency Network with Sensing, Communication, Computation, Caching, and Intelligence (E-SC3I). The framework incorporates mechanisms for emergency computing, caching, integrated communication and sensing, and intelligence empowerment. E-SC3I ensures rapid access to a large user base, reliable data transmission over unstable links, and dynamic network deployment in a changing environment. However, these advantages come at the cost of significant computation overhead. Therefore, we specifically concentrate on emergency computing and propose an adaptive collaborative inference method (ACIM) based on hierarchical reinforcement learning. Experimental results demonstrate our method's ability to achieve rapid inference of AI models with constrained computational and communication resources. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01728 [pdf, other]

Hardware Phi-1.5B: A Large Language Model Encodes Hardware Domain Specific Knowledge

Authors: Weimin Fu, Shijie Li, Yifang Zhao, Haocheng Ma, Raj Dutta, Xuan Zhang, Kaichen Yang, Yier **, Xiaolong Guo

Abstract: In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code know… ▽ More In the rapidly evolving semiconductor industry, where research, design, verification, and manufacturing are intricately linked, the potential of Large Language Models to revolutionize hardware design and security verification is immense. The primary challenge, however, lies in the complexity of hardware specific issues that are not adequately addressed by the natural language or software code knowledge typically acquired during the pretraining stage. Additionally, the scarcity of datasets specific to the hardware domain poses a significant hurdle in develo** a foundational model. Addressing these challenges, this paper introduces Hardware Phi 1.5B, an innovative large language model specifically tailored for the hardware domain of the semiconductor industry. We have developed a specialized, tiered dataset comprising small, medium, and large subsets and focused our efforts on pretraining using the medium dataset. This approach harnesses the compact yet efficient architecture of the Phi 1.5B model. The creation of this first pretrained, hardware domain specific large language model marks a significant advancement, offering improved performance in hardware design and verification tasks and illustrating a promising path forward for AI applications in the semiconductor sector. △ Less

Submitted 27 January, 2024; originally announced February 2024.

Comments: 6 pages, 6 figures

Journal ref: 29th IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC); 2024 January; Incheon Songdo Convensia, South Korea

arXiv:2401.18019 [pdf, other]

Joining Entities Across Relation and Graph with a Unified Model

Authors: Wenzhi Fu

Abstract: This paper introduces RG (Relational Genetic) model, a revised relational model to represent graph-structured data in RDBMS while preserving its topology, for efficiently and effectively extracting data in different formats from disparate sources. Along with: (a) SQL$_δ$, an SQL dialect augmented with graph pattern queries and tuple-vertex joins, such that one can extract graph properties via grap… ▽ More This paper introduces RG (Relational Genetic) model, a revised relational model to represent graph-structured data in RDBMS while preserving its topology, for efficiently and effectively extracting data in different formats from disparate sources. Along with: (a) SQL$_δ$, an SQL dialect augmented with graph pattern queries and tuple-vertex joins, such that one can extract graph properties via graph pattern matching, and "semantically" match entities across relations and graphs; (b) a logical representation of graphs in RDBMS, which introduces an exploration operator for efficient pattern querying, supports also browsing and updating graph-structured data; and (c) a strategy to uniformly evaluate SQL, pattern and hybrid queries that join tuples and vertices, all inside an RDBMS by leveraging its optimizer without performance degradation on switching different execution engines. A lightweight system, WhiteDB, is developed as an implementation to evaluate the benefits it can actually bring on real-life data. We empirically verified that the RG model enables the graph pattern queries to be answered as efficiently as in native graph engines; can consider the access on graph and relation in any order for optimal plan; and supports effective data enrichment. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 24 pages, 16 figures, 5 tables

ACM Class: H.2

arXiv:2401.16448 [pdf, other]

doi 10.1109/AsianHOST59942.2023.10409307

LLM4SecHW: Leveraging Domain Specific Large Language Model for Hardware Debugging

Authors: Weimin Fu, Kaichen Yang, Raj Gautam Dutta, Xiaolong Guo, Gang Qu

Abstract: This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose… ▽ More This paper presents LLM4SecHW, a novel framework for hardware debugging that leverages domain specific Large Language Model (LLM). Despite the success of LLMs in automating various software development tasks, their application in the hardware security domain has been limited due to the constraints of commercial LLMs and the scarcity of domain specific data. To address these challenges, we propose a unique approach to compile a dataset of open source hardware design defects and their remediation steps, utilizing version control data. This dataset provides a substantial foundation for training machine learning models for hardware. LLM4SecHW employs fine tuning of medium sized LLMs based on this dataset, enabling the identification and rectification of bugs in hardware designs. This pioneering approach offers a reference workflow for the application of fine tuning domain specific LLMs in other research areas. We evaluate the performance of our proposed system on various open source hardware designs, demonstrating its efficacy in accurately identifying and correcting defects. Our work brings a new perspective on automating the quality control process in hardware design. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 6 pages. 1 figure

Journal ref: 2023 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Tian**, China, 2023, pp. 1-6

arXiv:2401.03804 [pdf, other]

TeleChat Technical Report

Authors: Zhongjiang He, Zihan Wang, Xinzhang Liu, Shixuan Liu, Yitong Yao, Yuyao Huang, Xuelong Li, Yongxiang Li, Zhonghao Che, Zhaoxi Zhang, Yan Wang, Xin Wang, Luwen Pu, Huinan Xu, Ruiyu Fang, Yu Zhao, Jie Zhang, Xiaomeng Huang, Zhilong Lu, Jiaxin Peng, Wenjun Zheng, Shiquan Wang, Bingkai Yang, Xuewei he, Zhuoru Jiang , et al. (11 additional authors not shown)

Abstract: In this technical report, we present TeleChat, a collection of large language models (LLMs) with parameters of 3 billion, 7 billion and 12 billion. It includes pretrained language models as well as fine-tuned chat models that is aligned with human preferences. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, i… ▽ More In this technical report, we present TeleChat, a collection of large language models (LLMs) with parameters of 3 billion, 7 billion and 12 billion. It includes pretrained language models as well as fine-tuned chat models that is aligned with human preferences. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, including trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves comparable performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat's 7B and 12B variant, along with code and a portion of our pretraining data, to the public community. △ Less

Submitted 1 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: 28 pages, 2 figures

ACM Class: I.2.7

arXiv:2312.06580 [pdf, ps, other]

VGF: Value-Guided Fuzzing -- Fuzzing Hardware as Hardware

Authors: Ruochen Dai, Michael Lee, Patrick Hoey, Weimin Fu, Tuba Yavuz, Xiaolong Guo, Shuo Wang, Dean Sullivan, Orlando Arias

Abstract: As the complexity of logic designs increase, new avenues for testing digital hardware becomes necessary. Fuzz Testing (fuzzing) has recently received attention as a potential candidate for input vector generation on hardware designs. Using this technique, a fuzzer is used to generate an input to a logic design. Using a simulation engine, the logic design is given the generated stimulus and some me… ▽ More As the complexity of logic designs increase, new avenues for testing digital hardware becomes necessary. Fuzz Testing (fuzzing) has recently received attention as a potential candidate for input vector generation on hardware designs. Using this technique, a fuzzer is used to generate an input to a logic design. Using a simulation engine, the logic design is given the generated stimulus and some metric of feedback is given to the fuzzer to aid in the input mutation. However, much like software fuzzing, hardware fuzzing uses code coverage as a metric to find new possible fuzzing paths. Unfortunately, as we show in this work, this coverage metric falls short of generic on some hardware designs where designers have taken a more direct approach at expressing a particular microarchitecture, or implementation, of the desired hardware. With this work, we introduce a new coverage metric which employs not code coverage, but state coverage internal to a design. By observing changes in signals within the logic circuit under testing, we are able to explore the state space of the design and provide feedback to a fuzzer engine for input generation. Our approach, Value-Guided Fuzzing (VGF), provides a generic metric of coverage which can be applied to any design regardless of its implementation. In this paper, we introduce our state-based VGF metric as well as a sample implementation which can be used with any VPI, DPI, VHPI, or FLI compliant simulator, making it completely HDL agnostic. We demonstrate the generality of VGF and show how our sample implementation is capable of finding bugs considerably faster than previous approaches. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 20 pages, 7 figures, 7 tables

arXiv:2311.11608 [pdf]

doi 10.1093/jamia/ocae037

Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks

Authors: Ling Luo, **zhong Ning, Yingwen Zhao, Zhijun Wang, Zeyuan Ding, Peng Chen, Weiru Fu, Qinyu Han, Guangtao Xu, Yunzhi Qiu, Dinghao Pan, Jiru Li, Hao Li, Wenduo Feng, Senbo Tu, Yuqi Liu, Zhihao Yang, Jian Wang, Yuanyuan Sun, Hongfei Lin

Abstract: Objective: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curat… ▽ More Objective: Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Results: Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. Conclusion: Leveraging rich high-quality biomedical corpora and develo** effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multi-tasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches of smaller language models. △ Less

Submitted 19 December, 2023; v1 submitted 20 November, 2023; originally announced November 2023.

Journal ref: Journal of the American Medical Informatics Association, 2024, ocae037

arXiv:2311.06062 [pdf, other]

Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

Abstract: Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classifie… ▽ More Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classified into two categories: reference-free and reference-based attacks. They are both based on the hypothesis that training records consistently strike a higher probability of being sampled. Nevertheless, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. The reference-based attack seems to achieve promising effectiveness in LLMs, which measures a more reliable membership signal by comparing the probability discrepancy between the target model and the reference model. However, the performance of reference-based attack is highly dependent on a reference dataset that closely resembles the training dataset, which is usually inaccessible in the practical scenario. Overall, existing MIAs are unable to effectively unveil privacy leakage over practical fine-tuned LLMs that are overfitting-free and private. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal, probabilistic variation, which is based on memorization rather than overfitting. Furthermore, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs. △ Less

Submitted 25 June, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Repo: https://github.com/wjfu99/MIA-LLMs

arXiv:2311.06049 [pdf, other]

Privacy-Preserving Individual-Level COVID-19 Infection Prediction via Federated Graph Learning

Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

Abstract: Accurately predicting individual-level infection state is of great value since its essential role in reducing the damage of the epidemic. However, there exists an inescapable risk of privacy leakage in the fine-grained user mobility trajectories required by individual-level infection prediction. In this paper, we focus on develo** a framework of privacy-preserving individual-level infection pred… ▽ More Accurately predicting individual-level infection state is of great value since its essential role in reducing the damage of the epidemic. However, there exists an inescapable risk of privacy leakage in the fine-grained user mobility trajectories required by individual-level infection prediction. In this paper, we focus on develo** a framework of privacy-preserving individual-level infection prediction based on federated learning (FL) and graph neural networks (GNN). We propose Falcon, a Federated grAph Learning method for privacy-preserving individual-level infeCtion predictiON. It utilizes a novel hypergraph structure with spatio-temporal hyperedges to describe the complex interactions between individuals and locations in the contagion process. By organically combining the FL framework with hypergraph neural networks, the information propagation process of the graph machine learning is able to be divided into two stages distributed on the server and the clients, respectively, so as to effectively protect user privacy while transmitting high-level information. Furthermore, it elaborately designs a differential privacy perturbation mechanism as well as a plausible pseudo location generation approach to preserve user privacy in the graph structure. Besides, it introduces a cooperative coupling mechanism between the individual-level prediction model and an additional region-level model to mitigate the detrimental impacts caused by the injected obfuscation mechanisms. Extensive experimental results show that our methodology outperforms state-of-the-art algorithms and is able to protect user privacy against actual privacy attacks. Our code and datasets are available at the link: https://github.com/wjfu99/FL-epidemic. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: accepted by TOIS

arXiv:2311.05818 [pdf, other]

Learning Agile Bipedal Motions on a Quadrupedal Robot

Authors: Yunfei Li, **han Li, Wei Fu, Yi Wu

Abstract: Can a quadrupedal robot perform bipedal motions like humans? Although develo** human-like behaviors is more often studied on costly bipedal robot platforms, we present a solution over a lightweight quadrupedal robot that unlocks the agility of the quadruped in an upright standing pose and is capable of a variety of human-like motions. Our framework is with a hierarchical structure. At the low le… ▽ More Can a quadrupedal robot perform bipedal motions like humans? Although develo** human-like behaviors is more often studied on costly bipedal robot platforms, we present a solution over a lightweight quadrupedal robot that unlocks the agility of the quadruped in an upright standing pose and is capable of a variety of human-like motions. Our framework is with a hierarchical structure. At the low level is a motion-conditioned control policy that allows the quadrupedal robot to track desired base and front limb movements while balancing on two hind feet. The policy is commanded by a high-level motion generator that gives trajectories of parameterized human-like motions to the robot from multiple modalities of human input. We for the first time demonstrate various bipedal motions on a quadrupedal robot, and showcase interesting human-robot interaction modes including mimicking human videos, following natural language instructions, and physical interaction. The video is available at https://sites.google.com/view/bipedal-motions-quadruped. △ Less

Submitted 3 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: Camera ready for ICRA 2024

arXiv:2310.14509 [pdf, other]

Iteratively Learn Diverse Strategies with State Distance Information

Authors: Wei Fu, Weihua Du, **gwei Li, Sunli Chen, **gzhao Zhang, Yi Wu

Abstract: In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation… ▽ More In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure. In addition, we examine two common computation frameworks for this problem, i.e., population-based training (PBT) and iterative learning (ITR). We show that although PBT is the precise problem formulation, ITR can achieve comparable diversity scores with higher computation efficiency, leading to improved solution quality in practice. Based on our analysis, we further combine ITR with two tractable realizations of the state-distance-based diversity measures and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties. We empirically examine SIPO across three domains from robot locomotion to multi-agent games. In all of our testing environments, SIPO consistently produces strategically diverse and human-interpretable policies that cannot be discovered by existing baselines. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2309.16306 [pdf, other]

Can the Query-based Object Detector Be Designed with Fewer Stages?

Authors: Jialin Li, Weifu Fu, Yuhuan Lin, Qiang Nie, Yong Liu

Abstract: Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this… ▽ More Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this paper, we explore multiple techniques to enhance query-based detectors and, based on these findings, propose a novel model called GOLO (Global Once and Local Once), which follows a two-stage decoding paradigm. Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance. Experimental results on the COCO dataset demonstrate the effectiveness of our approach. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2308.15855 [pdf, other]

IIDM: Inter and Intra-domain Mixing for Semi-supervised Domain Adaptation in Semantic Segmentation

Authors: Weifu Fu, Qiang Nie, Jialin Li, Yuhuan Lin, Kai Wu, Jian Li, Yabiao Wang, Yong Liu, Chengjie Wang

Abstract: Despite recent advances in semantic segmentation, an inevitable challenge is the performance degradation caused by the domain shift in real applications. Current dominant approach to solve this problem is unsupervised domain adaptation (UDA). However, the absence of labeled target data in UDA is overly restrictive and limits performance. To overcome this limitation, a more practical scenario calle… ▽ More Despite recent advances in semantic segmentation, an inevitable challenge is the performance degradation caused by the domain shift in real applications. Current dominant approach to solve this problem is unsupervised domain adaptation (UDA). However, the absence of labeled target data in UDA is overly restrictive and limits performance. To overcome this limitation, a more practical scenario called semi-supervised domain adaptation (SSDA) has been proposed. Existing SSDA methods are derived from the UDA paradigm and primarily focus on leveraging the unlabeled target data and source data. In this paper, we highlight the significance of exploiting the intra-domain information between the labeled target data and unlabeled target data. Instead of solely using the scarce labeled target data for supervision, we propose a novel SSDA framework that incorporates both Inter and Intra Domain Mixing (IIDM), where inter-domain mixing mitigates the source-target domain gap and intra-domain mixing enriches the available target domain information, and the network can capture more domain-invariant features. We also explore different domain mixing strategies to better exploit the target domain information. Comprehensive experiments conducted on the GTA5 to Cityscapes and SYNTHIA to Cityscapes benchmarks demonstrate the effectiveness of IIDM, surpassing previous methods by a large margin. △ Less

Submitted 11 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 7 pages, 4 figures

arXiv:2308.12143 [pdf, other]

A Probabilistic Fluctuation based Membership Inference Attack for Diffusion Models

Authors: Wenjie Fu, Huandong Wang, Chen Gao, Guanghua Liu, Yong Li, Tao Jiang

Abstract: Membership Inference Attack (MIA) identifies whether a record exists in a machine learning model's training set by querying the model. MIAs on the classic classification models have been well-studied, and recent works have started to explore how to transplant MIA onto generative models. Our investigation indicates that existing MIAs designed for generative models mainly depend on the overfitting i… ▽ More Membership Inference Attack (MIA) identifies whether a record exists in a machine learning model's training set by querying the model. MIAs on the classic classification models have been well-studied, and recent works have started to explore how to transplant MIA onto generative models. Our investigation indicates that existing MIAs designed for generative models mainly depend on the overfitting in target models. However, overfitting can be avoided by employing various regularization techniques, whereas existing MIAs demonstrate poor performance in practice. Unlike overfitting, memorization is essential for deep learning models to attain optimal performance, making it a more prevalent phenomenon. Memorization in generative models leads to an increasing trend in the probability distribution of generating records around the member record. Therefore, we propose a Probabilistic Fluctuation Assessing Membership Inference Attack (PFAMI), a black-box MIA that infers memberships by detecting these trends via analyzing the overall probabilistic fluctuations around given records. We conduct extensive experiments across multiple generative models and datasets, which demonstrate PFAMI can improve the attack success rate (ASR) by about 27.9% when compared with the best baseline. △ Less

Submitted 25 June, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: Repo: https://github.com/wjfu99/MIA-Gen

arXiv:2307.09288 [pdf, other]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. △ Less

Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

arXiv:2306.16688 [pdf, other]

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Authors: Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu

Abstract: The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL tra… ▽ More The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries, reaching at most 21x higher training throughput in a distributed setting. On learning performance, beyond performing and scaling well on common RL benchmarks with different RL algorithms, SRL can reproduce the same solution in the challenging hide-and-seek environment as reported by OpenAI with up to 5x speedup in wall-clock time. Notably, SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores. SRL source code is available at: https://github.com/openpsi-project/srl . △ Less

Submitted 21 June, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Published at ICLR 2024. 10 pages (24 pages with references and appendix), 7 figures

arXiv:2305.14516 [pdf, other]

Chakra: Advancing Performance Benchmarking and Co-design using Standardized Execution Traces

Authors: Srinivas Sridharan, Taekyung Heo, Louis Feng, Zhaodong Wang, Matt Bergeron, Wenyin Fu, Shengbao Zheng, Brian Coutinho, Saeed Rashidi, Changhai Man, Tushar Krishna

Abstract: Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodol… ▽ More Benchmarking and co-design are essential for driving optimizations and innovation around ML models, ML software, and next-generation hardware. Full workload benchmarks, e.g. MLPerf, play an essential role in enabling fair comparison across different software and hardware stacks especially once systems are fully designed and deployed. However, the pace of AI innovation demands a more agile methodology to benchmark creation and usage by simulators and emulators for future system co-design. We propose Chakra, an open graph schema for standardizing workload specification capturing key operations and dependencies, also known as Execution Trace (ET). In addition, we propose a complementary set of tools/capabilities to enable collection, generation, and adoption of Chakra ETs by a wide range of simulators, emulators, and benchmarks. For instance, we use generative AI models to learn latent statistical properties across thousands of Chakra ETs and use these models to synthesize Chakra ETs. These synthetic ETs can obfuscate key proprietary information and also target future what-if scenarios. As an example, we demonstrate an end-to-end proof-of-concept that converts PyTorch ETs to Chakra ETs and uses this to drive an open-source training system simulator (ASTRA-sim). Our end-goal is to build a vibrant industry-wide ecosystem of agile benchmarks and tools to drive future AI system co-design. △ Less

Submitted 26 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2303.09035 [pdf, other]

Extracting the Brain-like Representation by an Improved Self-Organizing Map for Image Classification

Authors: Jiahong Zhang, Lihong Cao, Moning Zhang, Wenlong Fu

Abstract: Backpropagation-based supervised learning has achieved great success in computer vision tasks. However, its biological plausibility is always controversial. Recently, the bio-inspired Hebbian learning rule (HLR) has received extensive attention. Self-Organizing Map (SOM) uses the competitive HLR to establish connections between neurons, obtaining visual features in an unsupervised way. Although th… ▽ More Backpropagation-based supervised learning has achieved great success in computer vision tasks. However, its biological plausibility is always controversial. Recently, the bio-inspired Hebbian learning rule (HLR) has received extensive attention. Self-Organizing Map (SOM) uses the competitive HLR to establish connections between neurons, obtaining visual features in an unsupervised way. Although the representation of SOM neurons shows some brain-like characteristics, it is still quite different from the neuron representation in the human visual cortex. This paper proposes an improved SOM with multi-winner, multi-code, and local receptive field, named mlSOM. We observe that the neuron representation of mlSOM is similar to the human visual cortex. Furthermore, mlSOM shows a sparse distributed representation of objects, which has also been found in the human inferior temporal area. In addition, experiments show that mlSOM achieves better classification accuracy than the original SOM and other state-of-the-art HLR-based methods. The code is accessible at https://github.com/JiaHongZ/mlSOM. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: This paper has been accepted by ICASSP-2023

arXiv:2303.09005 [pdf, other]

Conditional Synthetic Food Image Generation

Authors: Wen** Fu, Yue Han, Jiangpeng He, Sriram Baireddy, Mridul Gupta, Fengqing Zhu

Abstract: Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting… ▽ More Generative Adversarial Networks (GAN) have been widely investigated for image synthesis based on their powerful representation learning ability. In this work, we explore the StyleGAN and its application of synthetic food image generation. Despite the impressive performance of GAN for natural image generation, food images suffer from high intra-class diversity and inter-class similarity, resulting in overfitting and visual artifacts for synthetic images. Therefore, we aim to explore the capability and improve the performance of GAN methods for food image generation. Specifically, we first choose StyleGAN3 as the baseline method to generate synthetic food images and analyze the performance. Then, we identify two issues that can cause performance degradation on food images during the training phase: (1) inter-class feature entanglement during multi-food classes training and (2) loss of high-resolution detail during image downsampling. To address both issues, we propose to train one food category at a time to avoid feature entanglement and leverage image patches cropped from high-resolution datasets to retain fine details. We evaluate our method on the Food-101 dataset and show improved quality of generated synthetic food images compared with the baseline. Finally, we demonstrate the great potential of improving the performance of downstream tasks, such as food image classification by including high-quality synthetic training samples in the data augmentation. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2301.04122 [pdf, other]

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Authors: Mingyu Liang, Wenyin Fu, Louis Feng, Zhongyi Lin, Pavani Panakanti, Shengbao Zheng, Srinivas Sridharan, Christina Delimitrou

Abstract: Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the f… ▽ More Building large AI fleets to support the rapidly growing DL workloads is an active research topic for modern cloud providers. Generating accurate benchmarks plays an essential role in designing the fast-paced software and hardware solutions in this space. Two fundamental challenges to make this scalable are (i) workload representativeness and (ii) the ability to quickly incorporate changes to the fleet into the benchmarks. To overcome these issues, we propose Mystique, an accurate and scalable framework for production AI benchmark generation. It leverages the PyTorch execution trace (ET), a new feature that captures the runtime information of AI models at the granularity of operators, in a graph format, together with their metadata. By sourcing fleet ETs, we can build AI benchmarks that are portable and representative. Mystique is scalable, due to its lightweight data collection, in terms of runtime overhead and instrumentation effort. It is also adaptive because ET composability allows flexible control on benchmark creation. We evaluate our methodology on several production AI models, and show that benchmarks generated with Mystique closely resemble original AI models, both in execution time and system-level metrics. We also showcase the portability of the generated benchmarks across platforms, and demonstrate several use cases enabled by the fine-grained composability of the execution trace. △ Less

Submitted 11 April, 2023; v1 submitted 16 December, 2022; originally announced January 2023.

Comments: Accepted to ISCA 2023

arXiv:2207.00493 [pdf, other]

Simulating financial time series using attention

Authors: Weilong Fu, Ali Hirsa, Jörg Osterrieder

Abstract: Financial time series simulation is a central topic since it extends the limited real data for training and evaluation of trading strategies. It is also challenging because of the complex statistical properties of the real financial data. We introduce two generative adversarial networks (GANs), which utilize the convolutional networks with attention and the transformers, for financial time series… ▽ More Financial time series simulation is a central topic since it extends the limited real data for training and evaluation of trading strategies. It is also challenging because of the complex statistical properties of the real financial data. We introduce two generative adversarial networks (GANs), which utilize the convolutional networks with attention and the transformers, for financial time series simulation. The GANs learn the statistical properties in a data-driven manner and the attention mechanism helps to replicate the long-range dependencies. The proposed GANs are tested on the S&P 500 index and option data, examined by scores based on the stylized facts and are compared with the pure convolutional GAN, i.e. QuantGAN. The attention-based GANs not only reproduce the stylized facts, but also smooth the autocorrelation of returns. △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2206.07505 [pdf, other]

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Authors: Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu

Abstract: Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to… ▽ More Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards develo** more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl. △ Less

Submitted 7 August, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: 16 pages, published as a conference paper in ICML 2022

arXiv:2206.06936 [pdf, ps, other]

Worst-case Design for RIS-aided Over-the-air Computation with Imperfect CSI

Authors: Wenhui Zhang, **dan Xu, Wei Xu, Xiaohu You, Weijie Fu

Abstract: Over-the-air computation (AirComp) enables fast wireless data aggregation at the receiver through concurrent transmission by sensors in the application of Internet-of-Things (IoT). To further improve the performance of AirComp under unfavorable propagation channel conditions, we consider the problem of computation distortion minimization in a reconfigurable intelligent surface (RIS)-aided AirComp… ▽ More Over-the-air computation (AirComp) enables fast wireless data aggregation at the receiver through concurrent transmission by sensors in the application of Internet-of-Things (IoT). To further improve the performance of AirComp under unfavorable propagation channel conditions, we consider the problem of computation distortion minimization in a reconfigurable intelligent surface (RIS)-aided AirComp system. In particular, we take into account an additive bounded uncertainty of the channel state information (CSI) and the total power constraint, and jointly optimize the transceiver (Tx-Rx) and the RIS phase design from the perspective of worst-case robustness by minimizing the mean squared error (MSE) of the computation. To solve this intractable nonconvex problem, we develop an efficient alternating algorithm where both solutions to the robust sub-problem and to the joint design of Tx-Rx and RIS are obtained in closed forms. Simulation results demonstrate the effectiveness of the proposed method. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.11407 [pdf]

Deep-learning-based prediction of nanoparticle phase transitions during in situ transmission electron microscopy

Authors: Wenkai Fu, Steven R. Spurgeon, Chongmin Wang, Yuyan Shao, Wei Wang, Amra Peles

Abstract: We develop the machine learning capability to predict a time sequence of in-situ transmission electron microscopy (TEM) video frames based on the combined long-short-term-memory (LSTM) algorithm and the features de-entanglement method. We train deep learning models to predict a sequence of future video frames based on the input of a sequence of previous frames. This unique capability provides insi… ▽ More We develop the machine learning capability to predict a time sequence of in-situ transmission electron microscopy (TEM) video frames based on the combined long-short-term-memory (LSTM) algorithm and the features de-entanglement method. We train deep learning models to predict a sequence of future video frames based on the input of a sequence of previous frames. This unique capability provides insight into size dependent structural changes in Au nanoparticles under dynamic reaction condition using in-situ environmental TEM data, informing models of morphological evolution and catalytic properties. The model performance and achieved accuracy of predictions are desirable based on, for scientific data characteristic, based on limited size of training data sets. The model convergence and values for the loss function mean square error show dependence on the training strategy, and structural similarity measure between predicted structure images and ground truth reaches the value of about 0.7. This computed structural similarity is smaller than values obtained when the deep learning architecture is trained using much larger benchmark data sets, it is sufficient to show the structural transition of Au nanoparticles. While performance parameters of our model applied to scientific data fall short of those achieved for the non-scientific big data sets, we demonstrate model ability to predict the evolution, even including the particle structural phase transformation, of Au nano particles as catalyst for CO oxidation under the chemical reaction conditions. Using this approach, it may be possible to anticipate the next steps of a chemical reaction for emerging automated experimentation platforms. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 16 pages, 13 figures

arXiv:2205.03850 [pdf, other]

SeqNet: An Efficient Neural Network for Automatic Malware Detection

Authors: Jiawei Xu, Wenxuan Fu, Haoyu Bu, Zhi Wang, Lingyun Ying

Abstract: Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. There… ▽ More Malware continues to evolve rapidly, and more than 450,000 new samples are captured every day, which makes manual malware analysis impractical. However, existing deep learning detection models need manual feature engineering or require high computational overhead for long training processes, which might be laborious to select feature space and difficult to retrain for mitigating model aging. Therefore, a crucial requirement for a detector is to realize automatic and efficient detection. In this paper, we propose a lightweight malware detection model called SeqNet which could be trained at high speed with low memory required on the raw binaries. By avoiding contextual confusion and reducing semantic loss, SeqNet maintains the detection accuracy when reducing the number of parameters to only 136K. We demonstrate the effectiveness of our methods and the low training cost requirement of SeqNet in our experiments. Besides, we make our datasets and codes public to stimulate further academic research. △ Less

Submitted 8 May, 2022; originally announced May 2022.

arXiv:2204.02246 [pdf, other]

Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization

Authors: Zihan Zhou, Wei Fu, Bingliang Zhang, Yi Wu

Abstract: We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones. To encourage the learning policy to consistently converge towards a previously undiscovered local optimum, RSPO switches between extrinsic and intrinsic rewards… ▽ More We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones. To encourage the learning policy to consistently converge towards a previously undiscovered local optimum, RSPO switches between extrinsic and intrinsic rewards via a trajectory-based novelty measurement during the optimization process. When a sampled trajectory is sufficiently distinct, RSPO performs standard policy optimization with extrinsic rewards. For trajectories with high likelihood under existing policies, RSPO utilizes an intrinsic diversity reward to promote exploration. Experiments show that RSPO is able to discover a wide spectrum of strategies in a variety of domains, ranging from single-agent particle-world tasks and MuJoCo continuous control to multi-agent stag-hunt games and StarCraftII challenges. △ Less

Submitted 3 May, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: 30 pages, 15 figures, published as a conference paper at ICLR 2022

arXiv:2203.07975 [pdf, other]

doi 10.1088/2632-2153/acb488

Categorical Representation Learning and RG flow operators for algorithmic classifiers

Authors: Artan Sheshmani, Yizhuang You, Wenbo Fu, Ahmadreza Azizi

Abstract: Following the earlier formalism of the categorical representation learning (arXiv:2103.14770) by the first two authors, we discuss the construction of the "RG-flow based categorifier". Borrowing ideas from theory of renormalization group flows (RG) in quantum field theory, holographic duality, and hyperbolic geometry, and mixing them with neural ODE's, we construct a new algorithmic natural langua… ▽ More Following the earlier formalism of the categorical representation learning (ar**. In particular we apply the RG categorifier to particular genomic sequences of flu viruses and show how our technology is capable of extracting the information from given genomic sequences, find their hidden symmetries and dominant features, classify them and use the trained data to make stochastic prediction of new plausible generated sequences associated with new set of viruses which could avoid the human immune system. The content of the current article is part of the recent US patent application submitted by first two authors (U.S. Patent Application No.: 63/313.504). △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: 31 pages, comments are very welcome

MSC Class: 03B70; 03-04; 03D10; 11Y16

Journal ref: Machine Learning: Science and Technology, 2023

arXiv:2203.01934 [pdf]

Quality or Quantity: Toward a Unified Approach for Multi-organ Segmentation in Body CT

Authors: Fakrul Islam Tushar, Husam Nujaim, Wanyi Fu, Ehsan Abadi, Maciej A. Mazurowski, Ehsan Samei, William P. Segars, Joseph Y. Lo

Abstract: Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will f… ▽ More Organ segmentation of medical images is a key step in virtual imaging trials. However, organ segmentation datasets are limited in terms of quality (because labels cover only a few organs) and quantity (since case numbers are limited). In this study, we explored the tradeoffs between quality and quantity. Our goal is to create a unified approach for multi-organ segmentation of body CT, which will facilitate the creation of large numbers of accurate virtual phantoms. Initially, we compared two segmentation architectures, 3D-Unet and DenseVNet, which were trained using XCAT data that is fully labeled with 22 organs, and chose the 3D-Unet as the better performing model. We used the XCAT-trained model to generate pseudo-labels for the CT-ORG dataset that has only 7 organs segmented. We performed two experiments: First, we trained 3D-UNet model on the XCAT dataset, representing quality data, and tested it on both XCAT and CT-ORG datasets. Second, we trained 3D-UNet after including the CT-ORG dataset into the training set to have more quantity. Performance improved for segmentation in the organs where we have true labels in both datasets and degraded when relying on pseudo-labels. When organs were labeled in both datasets, Exp-2 improved Average DSC in XCAT and CT-ORG by 1. This demonstrates that quality data is the key to improving the model's performance. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 6 pages, 3 figures, 2 tables, Accepted and Presented at SPIE Medical Imaging 2022

arXiv:2202.11124 [pdf, other]

Learning with Free Object Segments for Long-Tailed Instance Segmentation

Authors: Cheng Zhang, Tai-Yu Pan, Tianle Chen, Jike Zhong, Wen** Fu, Wei-Lun Chao

Abstract: One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-… ▽ More One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FreeSeg for extracting and leveraging these "free" object foreground segments to facilitate model training in long-tailed instance segmentation. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high-quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FreeSeg yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories. △ Less

Submitted 4 October, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

Comments: Accepted to ECCV 2022

arXiv:2111.02668 [pdf, other]

LVIS Challenge Track Technical Report 1st Place Solution: Distribution Balanced and Boundary Refinement for Large Vocabulary Instance Segmentation

Authors: WeiFu Fu, CongChong Nie, Ting Sun, Jun Liu, TianLiang Zhang, Yong Liu

Abstract: This report introduces the technical details of the team FuXi-Fresher for LVIS Challenge 2021. Our method focuses on the problem in following two aspects: the long-tail distribution and the segmentation quality of mask and boundary. Based on the advanced HTC instance segmentation algorithm, we connect transformer backbone(Swin-L) through composite connections inspired by CBNetv2 to enhance the bas… ▽ More This report introduces the technical details of the team Fu** combined with EMA method can achieve a great improvement. Finally, by using multi-scale testing and increasing the upper limit of the number of objects detected per image, we achieved more than 45.4% boundary AP on the val set of LVIS Challenge 2021. On the test data of LVIS Challenge 2021, we rank 1st and achieve 48.1% AP. Notably, our APr 47.5% is very closed to the APf 48.0%. △ Less

Submitted 4 November, 2021; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.14635 [pdf]

doi 10.1007/978-981-16-5207-3_56

An Improved Positioning Accuracy Method of a Robot Based on Particle Filter

Authors: Rashid Ali, Dil Nawaz Hakro, Yong** He, Wenpeng Fu, Zhiqiang Cao

Abstract: This paper aims to improve the performance and positioning accuracy of a robot by using the particle filter method. The laser range information is a wireless navigation system mainly used to measure, position, and control autonomous robots. Its localization is more flexible to control than wired guidance systems. However, the navigation through the laser range finder occurs with a large positionin… ▽ More This paper aims to improve the performance and positioning accuracy of a robot by using the particle filter method. The laser range information is a wireless navigation system mainly used to measure, position, and control autonomous robots. Its localization is more flexible to control than wired guidance systems. However, the navigation through the laser range finder occurs with a large positioning error while it moves or turns fast. For solving this problem, the paper proposes a method to improve the positioning accuracy of a robot in an indoor environment by using a particle filter with robust characteristics in a nonlinear or non-Gaussian system. In this experiment, a robot is equipped with a laser range finder, two encoders, and a gyro for navigation to verify the positioning accuracy and performance. The positioning accuracy and performance could improve by approximately 85.5% in this proposed method. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 12 pages, 6 figures, conference

arXiv:2107.11099 [pdf, other]

doi 10.1007/s11128-022-03442-8

RGB Image Classification with Quantum Convolutional Ansaetze

Authors: Yu **g, Xiaogang Li, Yang Yang, Chonghang Wu, Wenbing Fu, Wei Hu, Yuanyuan Li, Hua Xu

Abstract: With the rapid growth of qubit numbers and coherence times in quantum hardware technology, implementing shallow neural networks on the so-called Noisy Intermediate-Scale Quantum (NISQ) devices has attracted a lot of interest. Many quantum (convolutional) circuit ansaetze are proposed for grayscale images classification tasks with promising empirical results. However, when applying these ansaetze o… ▽ More With the rapid growth of qubit numbers and coherence times in quantum hardware technology, implementing shallow neural networks on the so-called Noisy Intermediate-Scale Quantum (NISQ) devices has attracted a lot of interest. Many quantum (convolutional) circuit ansaetze are proposed for grayscale images classification tasks with promising empirical results. However, when applying these ansaetze on RGB images, the intra-channel information that is useful for vision tasks is not extracted effectively. In this paper, we propose two types of quantum circuit ansaetze to simulate convolution operations on RGB images, which differ in the way how inter-channel and intra-channel information are extracted. To the best of our knowledge, this is the first work of a quantum convolutional circuit to deal with RGB images effectively, with a higher test accuracy compared to the purely classical CNNs. We also investigate the relationship between the size of quantum circuit ansatz and the learnability of the hybrid quantum-classical convolutional neural network. Through experiments based on CIFAR-10 and MNIST datasets, we demonstrate that a larger size of the quantum circuit ansatz improves predictive performance in multiclass classification tasks, providing useful insights for near term quantum algorithm developments. △ Less

Submitted 22 February, 2022; v1 submitted 23 July, 2021; originally announced July 2021.

Comments: https://link.springer.com/article/10.1007/s11128-022-03442-8

Journal ref: Quantum Inf Process 21, 101 (2022)

arXiv:2107.07113 [pdf, other]

Robust Learning for Text Classification with Multi-source Noise Simulation and Hard Example Mining

Authors: Guowei Xu, Wenbiao Ding, Wei** Fu, Zhongqin Wu, Zitao Liu

Abstract: Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR engines may introduce errors and inputs to downstream NLP models become noisy. Despite that pre-trained models achieve state-of-the-art performance in many NLP benc… ▽ More Many real-world applications involve the use of Optical Character Recognition (OCR) engines to transform handwritten images into transcripts on which downstream Natural Language Processing (NLP) models are applied. In this process, OCR engines may introduce errors and inputs to downstream NLP models become noisy. Despite that pre-trained models achieve state-of-the-art performance in many NLP benchmarks, we prove that they are not robust to noisy texts generated by real OCR engines. This greatly limits the application of NLP models in real-world scenarios. In order to improve model performance on noisy OCR transcripts, it is natural to train the NLP model on labelled noisy texts. However, in most cases there are only labelled clean texts. Since there is no handwritten pictures corresponding to the text, it is impossible to directly use the recognition model to obtain noisy labelled data. Human resources can be employed to copy texts and take pictures, but it is extremely expensive considering the size of data for model training. Consequently, we are interested in making NLP models intrinsically robust to OCR errors in a low resource manner. We propose a novel robust training framework which 1) employs simple but effective methods to directly simulate natural OCR noises from clean texts and 2) iteratively mines the hard examples from a large number of simulated samples for optimal performance. 3) To make our model learn noise-invariant representations, a stability loss is employed. Experiments on three real-world datasets show that the proposed framework boosts the robustness of pre-trained models by a large margin. We believe that this work can greatly promote the application of NLP models in actual scenarios, although the algorithm we use is simple and straightforward. We make our codes and three datasets publicly available\footnote{https://github.com/tal-ai/Robust-learning-MSSHEM}. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: ECML-PKDD'21: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021

arXiv:2107.04140 [pdf, other]

First-Generation Inference Accelerator Deployment at Facebook

Authors: Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, Satish Nadathur, Sam Naghshineh, Avinash Nayak, Jongsoo Park, Chris Petersen, Martin Schatz, Narayanan Sundaram, Bangsheng Tang, Peter Tang, Amy Yang, Jiecao Yu , et al. (90 additional authors not shown)

Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in… ▽ More In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design. △ Less

Submitted 4 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2106.07894 [pdf, other]

doi 10.1109/TC.2021.3087946

S2Engine: A Novel Systolic Architecture for Sparse Convolutional Neural Networks

Authors: Jianlei Yang, Wenzhi Fu, Xingzhou Cheng, Xucheng Ye, Pengcheng Dai, Weisheng Zhao

Abstract: Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelera… ▽ More Convolutional neural networks (CNNs) have achieved great success in performing cognitive tasks. However, execution of CNNs requires a large amount of computing resources and generates heavy memory traffic, which imposes a severe challenge on computing system design. Through optimizing parallel executions and data reuse in convolution, systolic architecture demonstrates great advantages in accelerating CNN computations. However, regular internal data transmission path in traditional systolic architecture prevents the systolic architecture from completely leveraging the benefits introduced by neural network sparsity. Deployment of fine-grained sparsity on the existing systolic architectures is greatly hindered by the incurred computational overheads. In this work, we propose S2Engine $-$ a novel systolic architecture that can fully exploit the sparsity in CNNs with maximized data reuse. S2Engine transmits compressed data internally and allows each processing element to dynamically select an aligned data from the compressed dataflow in convolution. Compared to the naive systolic array, S2Engine achieves about $3.2\times$ and about $3.0\times$ improvements on speed and energy efficiency, respectively. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: 13 pages, 17 figures

Journal ref: IEEE Transactions on Computers, 2021

arXiv:2106.03648 [pdf, other]

Cost-effective Map** of Mobile Robot Based on the Fusion of UWB and Short-range 2D LiDAR

Authors: Ran Liu, Yong** He, Chau Yuen, Billy Pik Lik Lau, Rashid Ali, Wenpeng Fu, Zhiqiang Cao

Abstract: Environment map** is an essential prerequisite for mobile robots to perform different tasks such as navigation and mission planning. With the availability of low-cost 2D LiDARs, there are increasing applications of such 2D LiDARs in industrial environments. However, environment map** in an unknown and feature-less environment with such low-cost 2D LiDARs remains a challenge. The challenge main… ▽ More Environment map** is an essential prerequisite for mobile robots to perform different tasks such as navigation and mission planning. With the availability of low-cost 2D LiDARs, there are increasing applications of such 2D LiDARs in industrial environments. However, environment map** in an unknown and feature-less environment with such low-cost 2D LiDARs remains a challenge. The challenge mainly originates from the short-range of LiDARs and complexities in performing scan matching in these environments. In order to resolve these shortcomings, we propose to fuse the ultra-wideband (UWB) with 2D LiDARs to improve the map** quality of a mobile robot. The optimization-based approach is utilized for the fusion of UWB ranging information and odometry to first optimize the trajectory. Then the LiDAR-based loop closures are incorporated to improve the accuracy of the trajectory estimation. Finally, the optimized trajectory is combined with the LiDAR scans to produce the occupancy map of the environment. The performance of the proposed approach is evaluated in an indoor feature-less environment with a size of 20m*20m. Obtained results show that the map** error of the proposed scheme is 85.5% less than that of the conventional GMap** algorithm with short-range LiDAR (for example Hokuyo URG-04LX in our experiment with a maximum range of 5.6m). △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE/ASME TRANSACTIONS ON MECHATRONICS

arXiv:2103.11526 [pdf, other]

ExAD: An Ensemble Approach for Explanation-based Adversarial Detection

Authors: Raj Vardhan, Ninghao Liu, Phakpoom Chinprutthiwong, Weijie Fu, Zhenyu Hu, Xia Ben Hu, Guofei Gu

Abstract: Recent research has shown Deep Neural Networks (DNNs) to be vulnerable to adversarial examples that induce desired misclassifications in the models. Such risks impede the application of machine learning in security-sensitive domains. Several defense methods have been proposed against adversarial attacks to detect adversarial examples at test time or to make machine learning models more robust. How… ▽ More Recent research has shown Deep Neural Networks (DNNs) to be vulnerable to adversarial examples that induce desired misclassifications in the models. Such risks impede the application of machine learning in security-sensitive domains. Several defense methods have been proposed against adversarial attacks to detect adversarial examples at test time or to make machine learning models more robust. However, while existing methods are quite effective under blackbox threat model, where the attacker is not aware of the defense, they are relatively ineffective under whitebox threat model, where the attacker has full knowledge of the defense. In this paper, we propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques. Each explanation technique in ExAD produces an explanation map identifying the relevance of input variables for the model's classification. For every class in a dataset, the system includes a detector network, corresponding to each explanation technique, which is trained to distinguish between normal and abnormal explanation maps. At test time, if the explanation map of an input is detected as abnormal by any detector model of the classified class, then we consider the input to be an adversarial example. We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets. Our extensive evaluation shows that our mechanism can effectively detect these attacks under blackbox threat model with limited false-positives. Furthermore, we find that our approach achieves promising results in limiting the success rate of whitebox attacks. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 15 pages, 10 figures

arXiv:2012.00058 [pdf]

PMLB v1.0: An open source dataset collection for benchmarking machine learning methods

Authors: Joseph D. Romano, Trang T. Le, William La Cava, John T. Gregg, Daniel J. Goldberg, Natasha L. Ray, Praneel Chakraborty, Daniel Himmelstein, Weixuan Fu, Jason H. Moore

Abstract: Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of… ▽ More Motivation: Novel machine learning and statistical modeling studies rely on standardized comparisons to existing methods using well-studied benchmark datasets. Few tools exist that provide rapid access to many of these datasets through a standardized, user-friendly interface that integrates well with popular data science workflows. Results: This release of PMLB provides the largest collection of diverse, public benchmark datasets for evaluating new machine learning and data science methods aggregated in one location. v1.0 introduces a number of critical improvements developed following discussions with the open-source community. Availability: PMLB is available at https://github.com/EpistasisLab/pmlb. Python and R interfaces for PMLB can be installed through the Python Package Index and Comprehensive R Archive Network, respectively. △ Less

Submitted 6 April, 2021; v1 submitted 30 November, 2020; originally announced December 2020.

Comments: 4 pages, 1 figure. *: These authors contributed equally

ACM Class: H.2.8

arXiv:2008.08730 [pdf]

iPhantom: a framework for automated creation of individualized computational phantoms and its application to CT organ dosimetry

Authors: Wanyi Fu, Shobhit Sharma, Ehsan Abadi, Alexandros-Stavros Iliopoulos, Qi Wang, Joseph Y. Lo, Xiaobai Sun, William P. Segars, Ehsan Samei

Abstract: Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas)… ▽ More Objective: This study aims to develop and validate a novel framework, iPhantom, for automated creation of patient-specific phantoms or digital-twins (DT) using patient medical images. The framework is applied to assess radiation dose to radiosensitive organs in CT imaging of individual patients. Method: From patient CT images, iPhantom segments selected anchor organs (e.g. liver, bones, pancreas) using a learning-based model developed for multi-organ CT segmentation. Organs challenging to segment (e.g. intestines) are incorporated from a matched phantom template, using a diffeomorphic registration model developed for multi-organ phantom-voxels. The resulting full-patient phantoms are used to assess organ doses during routine CT exams. Result: iPhantom was validated on both the XCAT (n=50) and an independent clinical (n=10) dataset with similar accuracy. iPhantom precisely predicted all organ locations with good accuracy of Dice Similarity Coefficients (DSC) >0.6 for anchor organs and DSC of 0.3-0.9 for all other organs. iPhantom showed less than 10% dose errors for the majority of organs, which was notably superior to the state-of-the-art baseline method (20-35% dose errors). Conclusion: iPhantom enables automated and accurate creation of patient-specific phantoms and, for the first time, provides sufficient and automated patient-specific dose estimates for CT dosimetry. Significance: The new framework brings the creation and application of CHPs to the level of individual CHPs through automation, achieving a wider and precise organ localization, paving the way for clinical monitoring, and personalized optimization, and large-scale research. △ Less

Submitted 19 August, 2020; originally announced August 2020.

Comments: Main text: 11 pages, 8 figures; Supplement material: 7 pages, 5 figures, 7 tables

arXiv:2008.03911 [pdf, other]

A Survey on Large-scale Machine Learning

Authors: Meng Wang, Weijie Fu, Xiangnan He, Shijie Hao, Xindong Wu

Abstract: Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of {Large-scale Mach… ▽ More Machine learning can provide deep insights into data, allowing machines to make high-quality predictions and having been widely used in real-world applications, such as text mining, visual classification, and recommender systems. However, most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. This issue calls for the need of {Large-scale Machine Learning} (LML), which aims to learn patterns from big data with comparable performance efficiently. In this paper, we offer a systematic survey on existing LML methods to provide a blueprint for the future developments of this area. We first divide these LML methods according to the ways of improving the scalability: 1) model simplification on computational complexities, 2) optimization approximation on computational efficiency, and 3) computation parallelism on computational capabilities. Then we categorize the methods in each perspective according to their targeted scenarios and introduce representative methods in line with intrinsic strategies. Lastly, we analyze their limitations and discuss potential directions as well as open issues that are promising to address in the future. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Showing 1–50 of 100 results for author: Fu, W