Search | arXiv e-print repository

arXiv:2406.19614 [pdf, other]

A Survey on Data Quality Dimensions and Tools for Machine Learning

Authors: Yuhan Zhou, Fengjiao Tu, Kewei Sha, Junhua Ding, Haihua Chen

Abstract: Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of… ▽ More Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of mastering DQ tools. In this survey, we review 17 DQ evaluation and improvement tools in the last 5 years. By introducing the DQ dimensions, metrics, and main functions embedded in these tools, we compare their strengths and limitations and propose a roadmap for develo** open-source DQ tools for ML. Based on the discussions on the challenges and emerging trends, we further highlight the potential applications of large language models (LLMs) and generative AI in DQ evaluation and improvement for ML. We believe this comprehensive survey can enhance understanding of DQ in ML and could drive progress in data-centric AI. A complete list of the literature investigated in this survey is available on GitHub at: https://github.com/haihua0913/awesome-dq4ml. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: This paper has been accepted by The 6th IEEE International Conference on Artificial Intelligence Testing (IEEE AITest 2024) as an invited paper

arXiv:2307.05944 [pdf]

A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications

Authors: Xiaomeng Wang, Fengshi Tian, Xizi Chen, Jiakun Zheng, Xuejiao Liu, Fengbin Tu, Jie Yang, Mohamad Sawan, Kwang-Ting Cheng, Chi-Ying Tsui

Abstract: In this paper, we propose a high-precision SRAM-based CIM macro that can perform 4x4-bit MAC operations and yield 9-bit signed output. The inherent discharge branches of SRAM cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations on two bit-line capacitors. The same principle is used for both MAC and A-to-D conversion ensuring high linearity and thus supporting large numbe… ▽ More In this paper, we propose a high-precision SRAM-based CIM macro that can perform 4x4-bit MAC operations and yield 9-bit signed output. The inherent discharge branches of SRAM cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations on two bit-line capacitors. The same principle is used for both MAC and A-to-D conversion ensuring high linearity and thus supporting large number of analog MAC accumulations. The memory cell-embedded ADC eliminates the use of separate ADCs and enhances energy and area efficiency. Additionally, two signal margin enhancement techniques, namely the MAC-folding and boosted-clip** schemes, are proposed to further improve the CIM computation accuracy. △ Less

Submitted 19 July, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: Submitted to IEEE ASSCC 2023

arXiv:2307.02847 [pdf, other]

doi 10.1145/3613424.3614246

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane

Authors: **yi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin

Abstract: Spatial architecture is a high-performance architecture that uses control flow graphs and data flow graphs as the computational model and producer/consumer models as the execution models. However, existing spatial architectures suffer from control flow handling challenges. Upon categorizing their PE execution models, we find that they lack autonomous, peer-to-peer, and temporally loosely-coupled c… ▽ More Spatial architecture is a high-performance architecture that uses control flow graphs and data flow graphs as the computational model and producer/consumer models as the execution models. However, existing spatial architectures suffer from control flow handling challenges. Upon categorizing their PE execution models, we find that they lack autonomous, peer-to-peer, and temporally loosely-coupled control flow handling capability. This leads to limited performance in intensive control programs. A spatial architecture, Marionette, is proposed, with an explicit-designed control flow plane. The Control Flow Plane enables autonomous, peer-to-peer and temporally loosely-coupled control flow handling. The Proactive PE Configuration ensures timely and computation-overlapped configuration to improve handling Branch Divergence. The Agile PE Assignment enhance the pipeline performance of Imperfect Loops. We develop full stack of Marionette (ISA, compiler, simulator, RTL) and demonstrate that in a variety of challenging intensive control programs, compared to state-of-the-art spatial architectures, Marionette outperforms Softbrain, TIA, REVEL, and RipTide by geomean 2.88x, 3.38x, 1.55x, and 2.66x. △ Less

Submitted 19 September, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

ACM Class: C.1.3; F.1.2

arXiv:2302.12510 [pdf, other]

doi 10.1109/TCAD.2023.3342730

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Authors: Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong

Abstract: To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamica… ▽ More To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2202.11343 [pdf, other]

Alleviating Datapath Conflicts and Design Centralization in Graph Analytics Acceleration

Authors: Haiyang Lin, Mingyu Yan, Duo Wang, Mo Zou, Fengbin Tu, Xiaochun Ye, Dongrui Fan, Yuan Xie

Abstract: Previous graph analytics accelerators have achieved great improvement on throughput by alleviating irregular off-chip memory accesses. However, on-chip side datapath conflicts and design centralization have become the critical issues hindering further throughput improvement. In this paper, a general solution, Multiple-stage Decentralized Propagation network (MDP-network), is proposed to address th… ▽ More Previous graph analytics accelerators have achieved great improvement on throughput by alleviating irregular off-chip memory accesses. However, on-chip side datapath conflicts and design centralization have become the critical issues hindering further throughput improvement. In this paper, a general solution, Multiple-stage Decentralized Propagation network (MDP-network), is proposed to address these issues, inspired by the key idea of trading latency for throughput. Besides, a novel High throughput Graph analytics accelerator, HiGraph, is proposed by deploying MDP-network to address each issue in practice. The experiment shows that compared with state-of-the-art accelerator, HiGraph achieves up to 2.2x speedup (1.5x on average) as well as better scalability. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: To Appear in 59th Design Automation Conference (DAC 2022)

arXiv:2107.11746 [pdf, other]

H2Learn: High-Efficiency Learning Accelerator for High-Accuracy Spiking Neural Networks

Authors: Ling Liang, Zheng Qu, Zhaodong Chen, Fengbin Tu, Yujie Wu, Lei Deng, Guoqi Li, Peng Li, Yuan Xie

Abstract: Although spiking neural networks (SNNs) take benefits from the bio-plausible neural modeling, the low accuracy under the common local synaptic plasticity learning rules limits their application in many practical tasks. Recently, an emerging SNN supervised learning algorithm inspired by backpropagation through time (BPTT) from the domain of artificial neural networks (ANNs) has successfully boosted… ▽ More Although spiking neural networks (SNNs) take benefits from the bio-plausible neural modeling, the low accuracy under the common local synaptic plasticity learning rules limits their application in many practical tasks. Recently, an emerging SNN supervised learning algorithm inspired by backpropagation through time (BPTT) from the domain of artificial neural networks (ANNs) has successfully boosted the accuracy of SNNs and helped improve the practicability of SNNs. However, current general-purpose processors suffer from low efficiency when performing BPTT for SNNs due to the ANN-tailored optimization. On the other hand, current neuromorphic chips cannot support BPTT because they mainly adopt local synaptic plasticity rules for simplified implementation. In this work, we propose H2Learn, a novel architecture that can achieve high efficiency for BPTT-based SNN learning which ensures high accuracy of SNNs. At the beginning, we characterized the behaviors of BPTT-based SNN learning. Benefited from the binary spike-based computation in the forward pass and the weight update, we first design lookup table (LUT) based processing elements in Forward Engine and Weight Update Engine to make accumulations implicit and to fuse the computations of multiple input points. Second, benefited from the rich sparsity in the backward pass, we design a dual-sparsity-aware Backward Engine which exploits both input and output sparsity. Finally, we apply a pipeline optimization between different engines to build an end-to-end solution for the BPTT-based SNN learning. Compared with the modern NVIDIA V100 GPU, H2Learn achieves 7.38x area saving, 5.74-10.20x speedup, and 5.25-7.12x energy saving on several benchmark datasets. △ Less

Submitted 25 July, 2021; originally announced July 2021.

arXiv:2105.08630 [pdf, other]

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Authors: Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, **-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu , et al. (13 additional authors not shown)

Abstract: Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based d… ▽ More Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

arXiv:1909.13265 [pdf, ps, other]

Adaptive Control for Marine Vessels Against Harsh Environmental Variation

Authors: Fangwen Tu, Shuzhi Sam Ge, Yoo Sang Choo, Chang Chieh Hang

Abstract: In this paper, robust control with sea state observer and dynamic thrust allocation is proposed for the Dynamic Positioning (DP) of an accommodation vessel in the presence of unknown hydrodynamic force variation and the input time delay. In order to overcome the huge force variation due to the adjoining Floating Production Storage and Offloading (FPSO) and accommodation vessel, a novel sea state o… ▽ More In this paper, robust control with sea state observer and dynamic thrust allocation is proposed for the Dynamic Positioning (DP) of an accommodation vessel in the presence of unknown hydrodynamic force variation and the input time delay. In order to overcome the huge force variation due to the adjoining Floating Production Storage and Offloading (FPSO) and accommodation vessel, a novel sea state observer is designed. The sea observer can effectively monitor the variation of the drift wave-induced force on the vessel and activate Neural Network (NN) compensator in the controller when large wave force is identified. Moreover, the wind drag coefficients can be adaptively approximated in the sea observer so that a feedforward control can be achieved. Based on this, a robust constrained control is developed to guarantee a safe operation. The time delay inside the control input is also considered. Dynamic thrust allocation module is presented to distribute the generalized control input among azimuth thrusters. Under the proposed sea observer and control, the boundedness of all the closed-loop signals are demonstrated via rigorous Lyapunov analysis. A set of simulation studies are conducted to verify the effectiveness of the proposed control scheme. △ Less

Submitted 29 September, 2019; originally announced September 2019.

arXiv:1812.08973 [pdf, ps, other]

Saliency Guided Hierarchical Robust Visual Tracking

Authors: Fangwen Tu, Shuzhi Sam Ge, Yazhe Tang, Chang Chieh Hang

Abstract: A saliency guided hierarchical visual tracking (SHT) algorithm containing global and local search phases is proposed in this paper. In global search, a top-down saliency model is novelly developed to handle abrupt motion and appearance variation problems. Nineteen feature maps are extracted first and combined with online learnt weights to produce the final saliency map and estimated target locatio… ▽ More A saliency guided hierarchical visual tracking (SHT) algorithm containing global and local search phases is proposed in this paper. In global search, a top-down saliency model is novelly developed to handle abrupt motion and appearance variation problems. Nineteen feature maps are extracted first and combined with online learnt weights to produce the final saliency map and estimated target locations. After the evaluation of integration mechanism, the optimum candidate patch is passed to the local search. In local search, a superpixel based HSV histogram matching is performed jointly with an L2-RLS tracker to take both color distribution and holistic appearance feature of the object into consideration. Furthermore, a linear refinement search process with fast iterative solver is implemented to attenuate the possible negative influence of dominant particles. Both qualitative and quantitative experiments are conducted on a series of challenging image sequences. The superior performance of the proposed method over other state-of-the-art algorithms is demonstrated by comparative study. △ Less

Submitted 21 December, 2018; originally announced December 2018.

arXiv:1812.08094 [pdf, ps, other]

Shallow Cue Guided Deep Visual Tracking via Mixed Models

Authors: Fangwen Tu, Shuzhi Sam Ge, Chang Chieh Hang

Abstract: In this paper, a robust visual tracking approach via mixed model based convolutional neural networks (SDT) is developed. In order to handle abrupt or fast motion, a prior map is generated to facilitate the localization of region of interest (ROI) before the deep tracker is performed. A top-down saliency model with nineteen shallow cues are employed to construct the prior map with online learnt com… ▽ More In this paper, a robust visual tracking approach via mixed model based convolutional neural networks (SDT) is developed. In order to handle abrupt or fast motion, a prior map is generated to facilitate the localization of region of interest (ROI) before the deep tracker is performed. A top-down saliency model with nineteen shallow cues are employed to construct the prior map with online learnt combination weights. Moreover, apart from a holistic deep learner, four local networks are also trained to learn different components of the target. The generated four local heat maps will facilitate to rectify the holistic map by eliminating the distracters to avoid drifting. Furthermore, to guarantee the instance for online update of high quality, a prioritised update strategy is implemented by casting the problem into a label noise problem. The selection probability is designed by considering both confidence values and bio-inspired memory for temporal information integration. Experiments are conducted qualitatively and quantitatively on a set of challenging image sequences. Comparative study demonstrates that the proposed algorithm outperforms other state-of-the-art methods. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Showing 1–10 of 10 results for author: Tu, F