Search | arXiv e-print repository

ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation

Authors: Zhiyu Mei, Wei Fu, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach… ▽ More Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to $4\times70$ billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of $2.0-10.6\times$ compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of $26\%$ performance improvement over heuristic approaches based on Megatron-LM. The source code of ReaLHF is publicly available at https://github.com/openpsi-project/ReaLHF . △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: 13 pages (15 pages with references), 13 figures

arXiv:2404.10719 [pdf, other]

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Authors: Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu

Abstract: Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal… ▽ More Reinforcement Learning from Human Feedback (RLHF) is currently the most widely used method to align large language models (LLMs) with human preferences. Existing RLHF methods can be roughly categorized as either reward-based or reward-free. Novel applications such as ChatGPT and Claude leverage reward-based methods that first learn a reward model and apply actor-critic algorithms, such as Proximal Policy Optimization (PPO). However, in academic benchmarks, state-of-the-art results are often achieved via reward-free methods, such as Direct Preference Optimization (DPO). Is DPO truly superior to PPO? Why does PPO perform poorly on these benchmarks? In this paper, we first conduct both theoretical and empirical studies on the algorithmic properties of DPO and show that DPO may have fundamental limitations. Moreover, we also comprehensively examine PPO and reveal the key factors for the best performances of PPO in fine-tuning LLMs. Finally, we benchmark DPO and PPO across a collection of RLHF testbeds, ranging from dialogue to code generation. Experiment results demonstrate that PPO is able to surpass other alignment methods in all cases and achieve state-of-the-art results in challenging code competitions. △ Less

Submitted 21 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: 16 pages, 2 figures, 14 tables

arXiv:2403.08185 [pdf, other]

Perceive With Confidence: Statistical Safety Assurances for Navigation with Learning-Based Perception

Authors: Anushri Dixit, Zhiting Mei, Meghan Booker, Mariko Storey-Matsutani, Allen Z. Ren, Anirudha Majumdar

Abstract: Rapid advances in perception have enabled large pre-trained models to be used out of the box for processing high-dimensional, noisy, and partial observations of the world into rich geometric representations (e.g., occupancy predictions). However, safe integration of these models onto robots remains challenging due to a lack of reliable performance in unfamiliar environments. In this work, we prese… ▽ More Rapid advances in perception have enabled large pre-trained models to be used out of the box for processing high-dimensional, noisy, and partial observations of the world into rich geometric representations (e.g., occupancy predictions). However, safe integration of these models onto robots remains challenging due to a lack of reliable performance in unfamiliar environments. In this work, we present a framework for rigorously quantifying the uncertainty of pre-trained perception models for occupancy prediction in order to provide end-to-end statistical safety assurances for navigation. We build on techniques from conformal prediction for producing a calibrated perception system that lightly processes the outputs of a pre-trained model while ensuring generalization to novel environments and robustness to distribution shifts in states when perceptual outputs are used in conjunction with a planner. The calibrated system can be used in combination with any safe planner to provide an end-to-end statistical assurance on safety in a new environment with a user-specified threshold $1-ε$. We evaluate the resulting approach - which we refer to as Perceive with Confidence (PwC) - with experiments in simulation and on hardware where a quadruped robot navigates through indoor environments containing objects unseen during training or calibration. These experiments validate the safety assurances provided by PwC and demonstrate significant improvements in empirical safety rates compared to baselines. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: Videos and code can be found at https://perceive-with-confidence.github.io

arXiv:2402.12957 [pdf, other]

Energy-Efficient Wireless Federated Learning via Doubly Adaptive Quantization

Authors: Xuefeng Han, Wen Chen, Jun Li, Ming Ding, Qingqing Wu, Kang Wei, Xiumei Deng, Zhen Mei

Abstract: Federated learning (FL) has been recognized as a viable distributed learning paradigm for training a machine learning model across distributed clients without uploading raw data. However, FL in wireless networks still faces two major challenges, i.e., large communication overhead and high energy consumption, which are exacerbated by client heterogeneity in dataset sizes and wireless channels. Whil… ▽ More Federated learning (FL) has been recognized as a viable distributed learning paradigm for training a machine learning model across distributed clients without uploading raw data. However, FL in wireless networks still faces two major challenges, i.e., large communication overhead and high energy consumption, which are exacerbated by client heterogeneity in dataset sizes and wireless channels. While model quantization is effective for energy reduction, existing works ignore adapting quantization to heterogeneous clients and FL convergence. To address these challenges, this paper develops an energy optimization problem of jointly designing quantization levels, scheduling clients, allocating channels, and controlling computation frequencies (QCCF) in wireless FL. Specifically, we derive an upper bound identifying the influence of client scheduling and quantization errors on FL convergence. Under the longterm convergence constraints and wireless constraints, the problem is established and transformed into an instantaneous problem with Lyapunov optimization. Solving Karush-Kuhn-Tucker conditions, our closed-form solution indicates that the doubly adaptive quantization level rises with the training process and correlates negatively with dataset sizes. Experiment results validate our theoretical results, showing that QCCF consumes less energy with faster convergence compared with state-of-the-art baselines. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2312.00774 [pdf, other]

Context Retrieval via Normalized Contextual Latent Interaction for Conversational Agent

Authors: Junfeng Liu, Zhuocheng Mei, Kewen Peng, Ranga Raju Vatsavai

Abstract: Conversational agents leveraging AI, particularly deep learning, are emerging in both academic research and real-world applications. However, these applications still face challenges, including disrespecting knowledge and facts, not personalizing to user preferences, and enormous demand for computational resources during training and inference. Recent research efforts have been focused on addressi… ▽ More Conversational agents leveraging AI, particularly deep learning, are emerging in both academic research and real-world applications. However, these applications still face challenges, including disrespecting knowledge and facts, not personalizing to user preferences, and enormous demand for computational resources during training and inference. Recent research efforts have been focused on addressing these challenges from various aspects, including supplementing various types of auxiliary information to the conversational agents. However, existing methods are still not able to effectively and efficiently exploit relevant information from these auxiliary supplements to further unleash the power of the conversational agents and the language models they use. In this paper, we present a novel method, PK-NCLI, that is able to accurately and efficiently identify relevant auxiliary information to improve the quality of conversational responses by learning the relevance among persona, chat history, and knowledge background through low-level normalized contextual latent interaction. Our experimental results indicate that PK-NCLI outperforms the state-of-the-art method, PK-FoCus, by 47.80%/30.61%/24.14% in terms of perplexity, knowledge grounding, and training efficiency, respectively, and maintained the same level of persona grounding performance. We also provide a detailed analysis of how different factors, including language model choices and trade-offs on training weights, would affect the performance of PK-NCLI. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Conference on Data Mining Workshops (ICDMW)

arXiv:2310.09002 [pdf, other]

Federated Meta-Learning for Few-Shot Fault Diagnosis with Representation Encoding

Authors: Jixuan Cui, Jun Li, Zhen Mei, Kang Wei, Sha Wei, Ming Ding, Wen Chen, Song Guo

Abstract: Deep learning-based fault diagnosis (FD) approaches require a large amount of training data, which are difficult to obtain since they are located across different entities. Federated learning (FL) enables multiple clients to collaboratively train a shared model with data privacy guaranteed. However, the domain discrepancy and data scarcity problems among clients deteriorate the performance of the… ▽ More Deep learning-based fault diagnosis (FD) approaches require a large amount of training data, which are difficult to obtain since they are located across different entities. Federated learning (FL) enables multiple clients to collaboratively train a shared model with data privacy guaranteed. However, the domain discrepancy and data scarcity problems among clients deteriorate the performance of the global FL model. To tackle these issues, we propose a novel framework called representation encoding-based federated meta-learning (REFML) for few-shot FD. First, a novel training strategy based on representation encoding and meta-learning is developed. It harnesses the inherent heterogeneity among training clients, effectively transforming it into an advantage for out-of-distribution generalization on unseen working conditions or equipment types. Additionally, an adaptive interpolation method that calculates the optimal combination of local and global models as the initialization of local training is proposed. This helps to further utilize local information to mitigate the negative effects of domain discrepancy. As a result, high diagnostic accuracy can be achieved on unseen working conditions or equipment types with limited training data. Compared with the state-of-the-art methods, such as FedProx, the proposed REFML framework achieves an increase in accuracy by 2.17%-6.50% when tested on unseen working conditions of the same equipment type and 13.44%-18.33% when tested on totally unseen equipment types, respectively. △ Less

Submitted 13 October, 2023; originally announced October 2023.

arXiv:2310.01966 [pdf]

Throughput Maximization for Instantly Decodable Network Coded NOMA in Broadcast Communication Systems

Authors: Zhonghui Mei

Abstract: Non-orthogonal multiple access (NOMA) is a promising transmission scheme employed at the physical layer to improve the spectral efficiency. In this paper, we develop a novel cross-layer approach by employing NOMA at the physical layer and instantly decodable network coding (IDNC) at the network layer in downlink cellular networks. Following this approach, two IDNC packets are selected for each tra… ▽ More Non-orthogonal multiple access (NOMA) is a promising transmission scheme employed at the physical layer to improve the spectral efficiency. In this paper, we develop a novel cross-layer approach by employing NOMA at the physical layer and instantly decodable network coding (IDNC) at the network layer in downlink cellular networks. Following this approach, two IDNC packets are selected for each transmission, with one designed for all receivers and the other designed only for the strong receivers which can employ successive interference cancellation (SIC). The IDNC packets selection, transmission rates adaption for the two IDNC packets, and NOMA power allocation are jointly considered to improve the throughput of the network. Given the intractability of the problem, we decouple it into two separate subproblems, the IDNC scheduling which jointly selects the IDNC packets and the transmission rates with the given NOMA power allocation, and the NOMA power allocation with the given IDNC scheduling. The IDNC scheduling can be reduced to a maximum weight clique problem, and two heuristic algorithms named as maximum weight vertex (MWV) search and maximum weight path based maximum weight vertex (MWP-MWV) search are developed to solve the first subproblem. An iterative function evaluation (IFE) approach is proposed to solve the second subproblem. Simulation results are presented to demonstrates the throughput gain of the proposed approach over the existing solutions. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2308.03521 [pdf, other]

Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity

Authors: Xuefeng Han, Jun Li, Wen Chen, Zhen Mei, Kang Wei, Ming Ding, H. Vincent Poor

Abstract: With the rapid proliferation of smart mobile devices, federated learning (FL) has been widely considered for application in wireless networks for distributed model training. However, data heterogeneity, e.g., non-independently identically distributions and different sizes of training data among clients, poses major challenges to wireless FL. Limited communication resources complicate the implement… ▽ More With the rapid proliferation of smart mobile devices, federated learning (FL) has been widely considered for application in wireless networks for distributed model training. However, data heterogeneity, e.g., non-independently identically distributions and different sizes of training data among clients, poses major challenges to wireless FL. Limited communication resources complicate the implementation of fair scheduling which is required for training on heterogeneous data, and further deteriorate the overall performance. To address this issue, this paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. Specifically, we first develop a closed-form expression for an upper bound on the FL loss function, with a particular emphasis on data heterogeneity described by a dataset size vector and a data divergence vector. Then we formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE). Next, via the Lyapunov drift technique, we transform the CRE optimization problem into a series of tractable problems. Extensive experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2306.16688 [pdf, other]

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Authors: Zhiyu Mei, Wei Fu, Jiaxuan Gao, Guangju Wang, Huanchen Zhang, Yi Wu

Abstract: The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL tra… ▽ More The ever-growing complexity of reinforcement learning (RL) tasks demands a distributed system to efficiently generate and process a massive amount of data. However, existing open-source libraries suffer from various limitations, which impede their practical use in challenging scenarios where large-scale training is necessary. In this paper, we present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework. Following this abstraction, we develop a scalable, efficient, and extensible distributed RL system called ReaLlyScalableRL, which allows efficient and massively parallelized training and easy development of customized algorithms. Our evaluation shows that SRL outperforms existing academic libraries, reaching at most 21x higher training throughput in a distributed setting. On learning performance, beyond performing and scaling well on common RL benchmarks with different RL algorithms, SRL can reproduce the same solution in the challenging hide-and-seek environment as reported by OpenAI with up to 5x speedup in wall-clock time. Notably, SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores. SRL source code is available at: https://github.com/openpsi-project/srl . △ Less

Submitted 21 June, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: Published at ICLR 2024. 10 pages (24 pages with references and appendix), 7 figures

arXiv:2211.03093 [pdf, other]

SRIBO: An Efficient and Resilient Single-Range and Inertia Based Odometry for Flying Robots

Authors: Wei Dong, Zheyuan Mei, Yuanjiong Ying, Sijia Chen, Yichen ie, Xiangyang Zhu

Abstract: Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a posi… ▽ More Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a positioning approach is still not widely accepted in real-world applications. To address this issue, this work first investigates the dissipative nature of flying robots considering aerial drag effects and re-formulates the corresponding positioning problem, which guarantees observability almost surely. On this basis, a dimension-reduced wriggling estimator is proposed accordingly. This estimator slides the estimation horizon in a step** manner, and output matrices can be approximately evaluated based on the historical estimation sequence. The computational complexity is then further reduced via a dimension-reduction approach using polynomial fittings. In this way, the states of robots can be estimated via linear programming in a sufficiently long interval, and the degree of observability is thereby further enhanced because an adequate redundancy of measurements is available for each estimation. Subsequently, the estimator's convergence and numerical stability are proven theoretically. Finally, both indoor and outdoor experiments verify that the proposed estimator can achieve decimeter-level precision at hundreds of hertz per second, and it is resilient to sensors' failures. Hopefully, this study can provide a new practical approach for self-localization as well as relative positioning of cooperative agents with low-cost and lightweight sensors. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2209.12139 [pdf, other]

Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Authors: Yifan Wang, Zhanxuan Mei, Ioannis Katsavounidis, C. -C. Jay Kuo

Abstract: A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decom… ▽ More A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decomposing correlations into long- and short-range correlations, we represent long-range correlations in coarser grids due to their smoothness, thus leading to a multi-grid (MG) coding architecture. Second, we show that short-range correlations can be effectively coded by a suite of vector quantizers (VQs). Along this line, we argue the effectiveness of VQs of very large block sizes and present a convenient way to implement them. It is shown by experimental results that MGBVQ offers excellent rate-distortion (RD) performance, which is comparable with existing image coders, at much lower complexity. Besides, it provides a progressive coded bitstream. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: GIC-python-v2

arXiv:2208.05271 [pdf, other]

Efficient Joint-Dimensional Search with Solution Space Regularization for Real-Time Semantic Segmentation

Authors: Peng Ye, Baopu Li, Tao Chen, Jiayuan Fan, Zhen Mei, Chen Lin, Chongyan Zuo, Qinghua Chi, Wanli Ouyan

Abstract: Semantic segmentation is a popular research topic in computer vision, and many efforts have been made on it with impressive results. In this paper, we intend to search an optimal network structure that can run in real-time for this problem. Towards this goal, we jointly search the depth, channel, dilation rate and feature spatial resolution, which results in a search space consisting of about 2.78… ▽ More Semantic segmentation is a popular research topic in computer vision, and many efforts have been made on it with impressive results. In this paper, we intend to search an optimal network structure that can run in real-time for this problem. Towards this goal, we jointly search the depth, channel, dilation rate and feature spatial resolution, which results in a search space consisting of about 2.78*10^324 possible choices. To handle such a large search space, we leverage differential architecture search methods. However, the architecture parameters searched using existing differential methods need to be discretized, which causes the discretization gap between the architecture parameters found by the differential methods and their discretized version as the final solution for the architecture search. Hence, we relieve the problem of discretization gap from the innovative perspective of solution space regularization. Specifically, a novel Solution Space Regularization (SSR) loss is first proposed to effectively encourage the supernet to converge to its discrete one. Then, a new Hierarchical and Progressive Solution Space Shrinking method is presented to further achieve high efficiency of searching. In addition, we theoretically show that the optimization of SSR loss is equivalent to the L_0-norm regularization, which accounts for the improved search-evaluation gap. Comprehensive experiments show that the proposed search scheme can efficiently find an optimal network structure that yields an extremely fast speed (175 FPS) of segmentation with a small model size (1 M) while maintaining comparable accuracy. △ Less

Submitted 10 August, 2022; originally announced August 2022.

arXiv:2202.00129 [pdf, other]

Fundamental Limits for Sensor-Based Robot Control

Authors: Anirudha Majumdar, Zhiting Mei, Vincent Pacelli

Abstract: Our goal is to develop theory and algorithms for establishing fundamental limits on performance imposed by a robot's sensors for a given task. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper… ▽ More Our goal is to develop theory and algorithms for establishing fundamental limits on performance imposed by a robot's sensors for a given task. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies). △ Less

Submitted 11 July, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: Extended version of paper presented at the 2022 Robotics: Science and Systems (RSS) conference

arXiv:2109.08927 [pdf, other]

Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic

Authors: Zijun Wu, Zi Xuan Zhang, Atharva Naik, Zhijian Mei, Mauajama Firdaus, Lili Mou

Abstract: Natural language inference (NLI) aims to determine the logical relationship between two sentences, such as Entailment, Contradiction, and Neutral. In recent years, deep learning models have become a prevailing approach to NLI, but they lack interpretability and explainability. In this work, we address the explainability of NLI by weakly supervised logical reasoning, and propose an Explainable Phra… ▽ More Natural language inference (NLI) aims to determine the logical relationship between two sentences, such as Entailment, Contradiction, and Neutral. In recent years, deep learning models have become a prevailing approach to NLI, but they lack interpretability and explainability. In this work, we address the explainability of NLI by weakly supervised logical reasoning, and propose an Explainable Phrasal Reasoning (EPR) approach. Our model first detects phrases as the semantic unit and aligns corresponding phrases in the two sentences. Then, the model predicts the NLI label for the aligned phrases, and induces the sentence label by fuzzy logic formulas. Our EPR is almost everywhere differentiable and thus the system can be trained end to end. In this way, we are able to provide explicit explanations of phrasal logical relationships in a weakly supervised manner. We further show that such reasoning results help textual explanation generation. △ Less

Submitted 22 February, 2023; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: Accepted by ICLR 2023

arXiv:2105.03649 [pdf, other]

In-Hardware Learning of Multilayer Spiking Neural Networks on a Neuromorphic Processor

Authors: Amar Shrestha, Haowen Fang, Daniel Patrick Rider, Zaidao Mei, Qinru Qiu

Abstract: Although widely used in machine learning, backpropagation cannot directly be applied to SNN training and is not feasible on a neuromorphic processor that emulates biological neuron and synapses. This work presents a spike-based backpropagation algorithm with biological plausible local update rules and adapts it to fit the constraint in a neuromorphic hardware. The algorithm is implemented on Intel… ▽ More Although widely used in machine learning, backpropagation cannot directly be applied to SNN training and is not feasible on a neuromorphic processor that emulates biological neuron and synapses. This work presents a spike-based backpropagation algorithm with biological plausible local update rules and adapts it to fit the constraint in a neuromorphic hardware. The algorithm is implemented on Intel Loihi chip enabling low power in-hardware supervised online learning of multilayered SNNs for mobile applications. We test this implementation on MNIST, Fashion-MNIST, CIFAR-10 and MSTAR datasets with promising performance and energy-efficiency, and demonstrate a possibility of incremental online learning with the implementation. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 6 pages, 5 figures, accepted for Design Automation Conference (DAC) 2021

arXiv:2104.10712 [pdf, other]

Neuromorphic Algorithm-hardware Codesign for Temporal Pattern Learning

Authors: Haowen Fang, Brady Taylor, Ziru Li, Zaidao Mei, Hai Li, Qinru Qiu

Abstract: Neuromorphic computing and spiking neural networks (SNN) mimic the behavior of biological systems and have drawn interest for their potential to perform cognitive tasks with high energy efficiency. However, some factors such as temporal dynamics and spike timings prove critical for information processing but are often ignored by existing works, limiting the performance and applications of neuromor… ▽ More Neuromorphic computing and spiking neural networks (SNN) mimic the behavior of biological systems and have drawn interest for their potential to perform cognitive tasks with high energy efficiency. However, some factors such as temporal dynamics and spike timings prove critical for information processing but are often ignored by existing works, limiting the performance and applications of neuromorphic computing. On one hand, due to the lack of effective SNN training algorithms, it is difficult to utilize the temporal neural dynamics. Many existing algorithms still treat neuron activation statistically. On the other hand, utilizing temporal neural dynamics also poses challenges to hardware design. Synapses exhibit temporal dynamics, serving as memory units that hold historical information, but are often simplified as a connection with weight. Most current models integrate synaptic activations in some storage medium to represent membrane potential and institute a hard reset of membrane potential after the neuron emits a spike. This is done for its simplicity in hardware, requiring only a "clear" signal to wipe the storage medium, but destroys temporal information stored in the neuron. In this work, we derive an efficient training algorithm for Leaky Integrate and Fire neurons, which is capable of training a SNN to learn complex spatial temporal patterns. We achieved competitive accuracy on two complex datasets. We also demonstrate the advantage of our model by a novel temporal pattern association task. Codesigned with this algorithm, we have developed a CMOS circuit implementation for a memristor-based network of neuron and synapses which retains critical neural dynamics with reduced complexity. This circuit implementation of the neuron model is simulated to demonstrate its ability to react to temporal spiking patterns with an adaptive threshold. △ Less

Submitted 6 May, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

arXiv:2102.00502 [pdf, other]

A Machine Learning Approach to Optimal Inverse Discrete Cosine Transform (IDCT) Design

Authors: Yifan Wang, Zhanxuan Mei, Chia-Yang Tsai, Ioannis Katsavounidis, C. -C. Jay Kuo

Abstract: The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coe… ▽ More The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coefficients. This is particularly obvious when the quality factor of JPEG compressed images is small. To address this problem, we first use a set of training images to learn the compound effect of forward DCT, quantization and dequantization in cascade. Then, a new IDCT kernel is learned to reverse the effect of such a pipeline. Experiments are conducted to demonstrate that the advantage of the new method, which has a gain of 0.11-0.30dB over the standard JPEG over a wide range of quality factors. △ Less

Submitted 31 January, 2021; originally announced February 2021.

Comments: conference

arXiv:2004.05340 [pdf, ps, other]

DNN-aided Read-voltage Threshold Optimization for MLC Flash Memory with Finite Block Length

Authors: Cheng Wang, Kang Wei, Lingjun Kong, Long Shi, Zhen Mei, Jun Li, Kui Cai

Abstract: The error correcting performance of multi-level-cell (MLC) NAND flash memory is closely related to the block length of error correcting codes (ECCs) and log-likelihood-ratios (LLRs) of the read-voltage thresholds. Driven by this issue, this paper optimizes the read-voltage thresholds for MLC flash memory to improve the decoding performance of ECCs with finite block length. First, through the analy… ▽ More The error correcting performance of multi-level-cell (MLC) NAND flash memory is closely related to the block length of error correcting codes (ECCs) and log-likelihood-ratios (LLRs) of the read-voltage thresholds. Driven by this issue, this paper optimizes the read-voltage thresholds for MLC flash memory to improve the decoding performance of ECCs with finite block length. First, through the analysis of channel coding rate (CCR) and decoding error probability under finite block length, we formulate the optimization problem of read-voltage thresholds to minimize the maximum decoding error probability. Second, we develop a cross iterative search (CIS) algorithm to optimize read-voltage thresholds under the perfect knowledge of flash memory channel. However, it is challenging to analytically characterize the voltage distribution under the effect of data retention noise (DRN), since the data retention time (DRT) is hard to be recorded for flash memory in reality. To address this problem, we develop a deep neural network (DNN) aided optimization strategy to optimize the read-voltage thresholds, where a multi-layer perception (MLP) network is employed to learn the relationship between voltage distribution and read-voltage thresholds. Simulation results show that, compared with the existing schemes, the proposed DNN-aided read-voltage threshold optimization strategy with a well-designed LDPC code can not only improve the program-and-erase (PE) endurance but also reduce the read latency. △ Less

Submitted 11 April, 2020; originally announced April 2020.

arXiv:1907.03938 [pdf, ps, other]

Deep Learning-Aided Dynamic Read Thresholds Design For Multi-Level-Cell Flash Memories

Authors: Zhen Mei, Kui Cai, Xuan He

Abstract: The practical NAND flash memory suffers from various non-stationary noises that are difficult to be predicted. Furthermore, the data retention noise induced channel offset is unknown during the readback process. This severely affects the data recovery from the memory cell. In this paper, we first propose a novel recurrent neural network (RNN)-based detector to effectively detect the data symbols s… ▽ More The practical NAND flash memory suffers from various non-stationary noises that are difficult to be predicted. Furthermore, the data retention noise induced channel offset is unknown during the readback process. This severely affects the data recovery from the memory cell. In this paper, we first propose a novel recurrent neural network (RNN)-based detector to effectively detect the data symbols stored in the multi-level-cell (MLC) flash memory without any prior knowledge of the channel. However, compared with the conventional threshold detector, the proposed RNN detector introduces much longer read latency and more power consumption. To tackle this problem, we further propose an RNN-aided (RNNA) dynamic threshold detector, whose detection thresholds can be derived based on the outputs of the RNN detector. We thus only need to activate the RNN detector periodically when the system is idle. Moreover, to enable soft-decision decoding of error-correction codes, we first show how to obtain more read thresholds based on the hard-decision read thresholds derived from the RNN detector. We then propose integer-based reliability map**s based on the designed read thresholds, which can generate the soft information of the channel. Finally, we propose to apply density evolution (DE) combined with differential evolution algorithm to optimize the read thresholds for LDPC coded flash memory channels. Computer simulation results demonstrate the effectiveness of our RNNA dynamic read thresholds design, for both the uncoded and LDPC-coded flash memory channels, without any prior knowledge of the channel. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1907.02944 [pdf]

Proceedings of the 11th Asia-Europe Workshop on Concepts in Information Theory

Authors: A. J. Han Vinck, Kees A. Schouhamer Immink, Tadashi Wadayama, Van Khu Vu, Akiko Manada, Kui Cai, Shunsuke Horii, Yoshiki Abe, Mitsugu Iwamoto, Kazuo Ohta, Xingwei Zhong, Zhen Mei, Renfei Bu, J. H. Weber, Vitaly Skachek, Hiroyoshi Morita, N. Hovhannisyan, Hiroshi Kamabe, Shan Lu, Hirosuke Yamamoto, Kengo Hasimoto, O. Ytrehus, Shigeaki Kuzuoaka, Mikihiko Nishiara, Han Mao Kiah , et al. (2 additional authors not shown)

Abstract: This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community.… ▽ More This year, 2019 we celebrate 30 years of our friendship between Asian and European scientists at the AEW11 in Rotterdam, the Netherlands. Many of the 1989 participants are also present at the 2019 event. This year we have many participants from different parts of Asia and Europe. It shows the importance of this event. It is a good tradition to pay a tribute to a special lecturer in our community. This year we selected Hiroyoshi Morita, who is a well known information theorist with many original contributions. △ Less

Submitted 26 June, 2019; originally announced July 2019.

arXiv:1904.13245 [pdf, ps, other]

Design of Protograph Codes for Additive White Symmetric Alpha-Stable Noise Channels

Authors: Xingwei Zhong, Kui Cai, **** Chen, Zhen Mei

Abstract: The protograph low-density parity-check (LDPC) codes possess many attractive properties, such as the low encoding/decoding complexity and better error floor performance, and hence have been successfully applied to different types of communication and data storage channels. In this paper,we design protograph LDPC codes for communication systems corrupted by the impulsive noise, which are modeled as… ▽ More The protograph low-density parity-check (LDPC) codes possess many attractive properties, such as the low encoding/decoding complexity and better error floor performance, and hence have been successfully applied to different types of communication and data storage channels. In this paper,we design protograph LDPC codes for communication systems corrupted by the impulsive noise, which are modeled as additive white symmetric alpha-stable noise (AWSaSN) channels. We start by presenting a novel simulation-based protograph extrinsic information transfer (P-EXIT) analysis to derive the iterative decoding threshold of the protograph codes. By further applying the asymptotic weight distribution (AWD) analysis, we design new protograph codes for the AWSaSN channel. Both theoretical analysis and simulation results demonstrate that the proposed protograph codes can provide better error rate performance than the prior art AR4JA code, the irregular codes optimized for the AWGN channel, as well as the irregular codes optimized for the AWSaSN channel. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1904.06666 [pdf, other]

Mutual Information-Maximizing Quantized Belief Propagation Decoding of Regular LDPC Codes

Authors: Xuan He, Kui Cai, Zhen Mei, Peng Kang, Xiaohu Tang

Abstract: In this paper, we propose a class of finite alphabet iterative decoder (FAID), called mutual information-maximizing quantized belief propagation (MIM-QBP) decoder, for decoding regular low-density parity-check (LDPC) codes. Our decoder follows the reconstruction-calculation-quantization (RCQ) decoding architecture that is widely used in FAIDs. We present the first complete and systematic design fr… ▽ More In this paper, we propose a class of finite alphabet iterative decoder (FAID), called mutual information-maximizing quantized belief propagation (MIM-QBP) decoder, for decoding regular low-density parity-check (LDPC) codes. Our decoder follows the reconstruction-calculation-quantization (RCQ) decoding architecture that is widely used in FAIDs. We present the first complete and systematic design framework for the RCQ parameters, and prove that our design with sufficient precision at node update is able to maximize the mutual information between coded bits and exchanged messages. Simulation results show that the MIM-QBP decoder can always considerably outperform the state-of-the-art mutual information-maximizing FAIDs that adopt two-input single-output lookup tables for decoding. Furthermore, with only 3 bits being used for each exchanged message, the MIM-QBP decoder can outperform the floating-point belief propagation decoder at the high signal-to-noise ratio regions when testing on high-rate LDPC codes with a maximum of 10 and 30 iterations. △ Less

Submitted 16 December, 2022; v1 submitted 14 April, 2019; originally announced April 2019.

arXiv:1902.06289 [pdf, ps, other]

Neural Network-Based Dynamic Threshold Detection for Non-Volatile Memories

Authors: Zhen Mei, Kui Cai, Xingwei Zhong

Abstract: The memory physics induced unknown offset of the channel is a critical and difficult issue to be tackled for many non-volatile memories (NVMs). In this paper, we first propose novel neural network (NN) detectors by using the multilayer perceptron (MLP) network and the recurrent neural network (RNN), which can effectively tackle the unknown offset of the channel. However, compared with the conventi… ▽ More The memory physics induced unknown offset of the channel is a critical and difficult issue to be tackled for many non-volatile memories (NVMs). In this paper, we first propose novel neural network (NN) detectors by using the multilayer perceptron (MLP) network and the recurrent neural network (RNN), which can effectively tackle the unknown offset of the channel. However, compared with the conventional threshold detector, the NN detectors will incur a significant delay of the read latency and more power consumption. Therefore, we further propose a novel dynamic threshold detector (DTD), whose detection threshold can be derived based on the outputs of the proposed NN detectors. In this way, the NN-based detection only needs to be invoked when the error correction code (ECC) decoder fails, or periodically when the system is in the idle state. Thereafter, the threshold detector will still be adopted by using the adjusted detection threshold derived base on the outputs of the NN detector, until a further adjustment of the detection threshold is needed. Simulation results demonstrate that the proposed DTD based on the RNN detection can achieve the error performance of the optimum detector, without the prior knowledge of the channel. △ Less

Submitted 17 February, 2019; originally announced February 2019.

Comments: A six-page version of this paper has been accepted by ICC 2019

arXiv:1901.01659 [pdf, other]

Dynamic Programming for Sequential Deterministic Quantization of Discrete Memoryless Channels

Authors: Xuan He, Kui Cai, Wentu Song, Zhen Mei

Abstract: In this paper, under a general cost function $C$, we present a dynamic programming (DP) method to obtain an optimal sequential deterministic quantizer (SDQ) for $q$-ary input discrete memoryless channel (DMC). The DP method has complexity $O(q (N-M)^2 M)$, where $N$ and $M$ are the alphabet sizes of the DMC output and quantizer output, respectively. Then, starting from the quadrangle inequality, t… ▽ More In this paper, under a general cost function $C$, we present a dynamic programming (DP) method to obtain an optimal sequential deterministic quantizer (SDQ) for $q$-ary input discrete memoryless channel (DMC). The DP method has complexity $O(q (N-M)^2 M)$, where $N$ and $M$ are the alphabet sizes of the DMC output and quantizer output, respectively. Then, starting from the quadrangle inequality, two techniques are applied to reduce the DP method's complexity. One technique makes use of the Shor-Moran-Aggarwal-Wilber-Klawe (SMAWK) algorithm and achieves complexity $O(q (N-M) M)$. The other technique is much easier to be implemented and achieves complexity $O(q (N^2 - M^2))$. We further derive a sufficient condition under which the optimal SDQ is optimal among all quantizers and the two techniques are applicable. This generalizes the results in the literature for binary-input DMC. Next, we show that the cost function of $α$-mutual information ($α$-MI)-maximizing quantizer belongs to the category of $C$. We further prove that under a weaker condition than the sufficient condition we derived, the aforementioned two techniques are applicable to the design of $α$-MI-maximizing quantizer. Finally, we illustrate the particular application of our design method to practical pulse-amplitude modulation systems. △ Less

Submitted 23 February, 2021; v1 submitted 6 January, 2019; originally announced January 2019.

Comments: 14 pages, 3 figures, accepted by TCOM

arXiv:1811.03832 [pdf, ps, other]

doi 10.1109/ITW.2018.8613421

Information Theoretic Bounds Based Channel Quantization Design for Emerging Memories

Authors: Zhen Mei, Kui Cai, Long Shi

Abstract: Channel output quantization plays a vital role in high-speed emerging memories such as the spin-torque transfer magnetic random access memory (STT-MRAM), where high-precision analog-to-digital converters (ADCs) are not applicable. In this paper, we investigate the design of the 1-bit quantizer which is highly suitable for practical applications. We first propose a quantized channel model for STT-M… ▽ More Channel output quantization plays a vital role in high-speed emerging memories such as the spin-torque transfer magnetic random access memory (STT-MRAM), where high-precision analog-to-digital converters (ADCs) are not applicable. In this paper, we investigate the design of the 1-bit quantizer which is highly suitable for practical applications. We first propose a quantized channel model for STT-MRAM. We then analyze various information theoretic bounds for the quantized channel, including the channel capacity, cutoff rate, and the Polyanskiy-Poor-Verdú (PPV) finite-length performance bound. By using these channel measurements as criteria, we design and optimize the 1-bit quantizer numerically for the STT-MRAM channel. Simulation results show that the proposed quantizers significantly outperform the conventional minimum mean-squared error (MMSE) based Lloyd-Max quantizer, and can approach the performance of the 1-bit quantizer optimized by error rate simulations. △ Less

Submitted 9 November, 2018; originally announced November 2018.

Comments: This paper is accepted by ITW 2018

arXiv:1712.00983 [pdf, ps, other]

Design of Polar Codes with Single and Multi-Carrier Modulation on Impulsive Noise Channels using Density Evolution

Authors: Zhen Mei, Bin Dai, Martin Johnston, Rolando Carrasco

Abstract: In this paper, density evolution-based construction methods to design good polar codes on impulsive noise channels for single-carrier and multi-carrier systems are proposed and evaluated. For a single-carrier system, the tight bound of the block error probability (BLEP) is derived by applying density evolution and the performance of the proposed construction methods are compared. For the multi-car… ▽ More In this paper, density evolution-based construction methods to design good polar codes on impulsive noise channels for single-carrier and multi-carrier systems are proposed and evaluated. For a single-carrier system, the tight bound of the block error probability (BLEP) is derived by applying density evolution and the performance of the proposed construction methods are compared. For the multi-carrier system employing orthogonal frequency-division multiplexing, the accurate BLEP estimation is not feasible so a tight lower bound on the BLEP for polar codes is derived by assuming the noise on each sub-carrier is Gaussian. The results show that the lower bound becomes tighter as the number of carriers increases. △ Less

Submitted 4 December, 2017; originally announced December 2017.

Comments: 5 pages, 3 figures

Showing 1–26 of 26 results for author: Mei, Z