Search | arXiv e-print repository

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation. Compared with DeepSeek 67B, DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality and multi-source corpus consisting of 8.1T tokens, and further perform Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock its potential. Evaluation results show that, even with only 21B activated parameters, DeepSeek-V2 and its chat versions still achieve top-tier performance among open-source models. △ Less

Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

arXiv:2403.14482 [pdf, other]

Assessing exchange-correlation functionals for heterogeneous catalysis of nitrogen species

Authors: Honghui Kim, Neung-Kyung Yu, Nianhan Tian, Andrew J. Medford

Abstract: Increasing interest in sustainable synthesis of ammonia, nitrates, and urea has led to an increase in studies of catalytic conversion between nitrogen-containing compounds using heterogeneous catalysts. Density functional theory (DFT) is commonly employed to obtain molecular-scale insight into these reactions, but there have been relatively few assessments of the exchange-correlation functionals t… ▽ More Increasing interest in sustainable synthesis of ammonia, nitrates, and urea has led to an increase in studies of catalytic conversion between nitrogen-containing compounds using heterogeneous catalysts. Density functional theory (DFT) is commonly employed to obtain molecular-scale insight into these reactions, but there have been relatively few assessments of the exchange-correlation functionals that are best suited for heterogeneous catalysis of nitrogen compounds. Here, we assess a range of functionals ranging from the generalized gradient approximation (GGA) to the random phase approximation (RPA) for the formation energies of gas-phase nitrogen species, the lattice constants of representative solids from several common classes of catalysts (metals, oxides, and metal-organic frameworks (MOFs)), and the adsorption energies of a range of nitrogen-containing intermediates on these materials. The results reveal that the choice of exchange-correlation functional and van der Waals correction can have a surprisingly large effect and that increasing the level of theory does not always improve the accuracy for nitrogen-containing compounds. This suggests that the selection of functionals should be carefully evaluated on the basis of the specific reaction and material being studied. △ Less

Submitted 20 June, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 44 pages, 20 figures. Figure 4 (MIL-125) data is changed. Relevant contents (texts, tables, figures, SI) are changed. VASP data is shared with accessible Zenodo link

arXiv:2403.10494 [pdf, other]

Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2

Authors: Adam Rashid, Chung Min Kim, Justin Kerr, Letian Fu, Kush Hari, Ayah Ahmad, Kaiyuan Chen, Huang Huang, Marcus Gualtieri, Michael Wang, Christian Juette, Nan Tian, Liu Ren, Ken Goldberg

Abstract: Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes… ▽ More Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes and selectively updating these regions of the environment, avoiding the need to exhaustively remap. Human users can query inventory by providing natural language queries and receiving a 3D heatmap of potential object locations. To manage the computational load, we use Fog-ROS2, a cloud robotics platform, to offload resource-intensive tasks. Lifelong LERF obtains poses from a monocular RGBD SLAM backend, and uses these poses to progressively optimize a Language Embedded Radiance Field (LERF) for semantic monitoring. Experiments with 3-5 objects arranged on a tabletop and a Turtlebot with a RealSense camera suggest that Lifelong LERF can persistently adapt to changes in objects with up to 91% accuracy. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: See project webpage at: https://sites.google.com/berkeley.edu/lifelonglerf/home

arXiv:2307.04103 [pdf]

CA-CentripetalNet: A novel anchor-free deep learning framework for hardhat wearing detection

Authors: Zhijian Liu, Nian Cai, Wensheng Ouyang, Chengbin Zhang, Nili Tian, Han Wang

Abstract: Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve… ▽ More Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve the feature extraction and utilization ability of CA-CentripetalNet, which are vertical-horizontal corner pooling and bounding constrained center attention. The former is designed to realize the comprehensive utilization of marginal features and internal features. The latter is designed to enforce the backbone to pay attention to internal features, which is only used during the training rather than during the detection. Experimental results indicate that the CA-CentripetalNet achieves better performance with the 86.63% mAP (mean Average Precision) with less memory consumption at a reasonable speed than the existing deep learning based methods, especially in case of small-scale hardhats and non-worn-hardhats. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: It has been accepted for the journal of Signal, Image and Video Processing, which is a complete version. It is noted that it has been deleted for future publishing

Journal ref: Signal, Image and Video Processing,2023

arXiv:2211.08114 [pdf]

Metal to Mott Insulator Transition in Two-dimensional 1T-TaSe$_2$

Authors: Ning Tian, Zhe Huang, Bo Gyu Jang, Shuaifei Guo, Ya-Jun Yan, **g**g Gao, Yijun Yu, **woong Hwang, Meixiao Wang, Xuan Luo, Yu ** Sun, Zhongkai Liu, Dong-Lai Feng, Xianhui Chen, Sung-Kwan Mo, Minjae Kim, Young-Woo Son, Dawei Shen, Wei Ruan, Yuanbo Zhang

Abstract: When electron-electron interaction dominates over other electronic energy scales, exotic, collective phenomena often emerge out of seemingly ordinary matter. The strongly correlated phenomena, such as quantum spin liquid and unconventional superconductivity, represent a major research frontier and a constant source of inspiration. Central to strongly correlated physics is the concept of Mott insul… ▽ More When electron-electron interaction dominates over other electronic energy scales, exotic, collective phenomena often emerge out of seemingly ordinary matter. The strongly correlated phenomena, such as quantum spin liquid and unconventional superconductivity, represent a major research frontier and a constant source of inspiration. Central to strongly correlated physics is the concept of Mott insulator, from which various other correlated phases derive. The advent of two-dimensional (2D) materials brings unprecedented opportunities to the study of strongly correlated physics in the 2D limit. In particular, the enhanced correlation and extreme tunability of 2D materials enables exploring strongly correlated systems across uncharted parameter space. Here, we discover an intriguing metal to Mott insulator transition in 1T-TaSe$_2$ as the material is thinned down to atomic thicknesses. Specifically, we discover, for the first time, that the bulk metallicity of 1T-TaSe$_2$ arises from a band crossing Fermi level. Reducing the dimensionality effectively quenches the kinetic energy of the initially itinerant electrons and drives the material into a Mott insulating state. The dimensionality-driven Metal to Mott insulator transition resolves the long-standing dichotomy between metallic bulk and insulating surface of 1T-TaSe$_2$. Our results additionally establish 1T-TaSe$_2$ as an ideal variable system for exploring various strongly correlated phenomena. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2004.14321 [pdf, other]

doi 10.1109/TII.2020.2983176

Real-Time Optimal Lithium-Ion Battery Charging Based on Explicit Model Predictive Control

Authors: Ning Tian, Huazhen Fang, Yebin Wang

Abstract: The rapidly growing use of lithium-ion batteries across various industries highlights the pressing issue of optimal charging control, as charging plays a crucial role in the health, safety and life of batteries. The literature increasingly adopts model predictive control (MPC) to address this issue, taking advantage of its capability of performing optimization under constraints. However, the compu… ▽ More The rapidly growing use of lithium-ion batteries across various industries highlights the pressing issue of optimal charging control, as charging plays a crucial role in the health, safety and life of batteries. The literature increasingly adopts model predictive control (MPC) to address this issue, taking advantage of its capability of performing optimization under constraints. However, the computationally complex online constrained optimization intrinsic to MPC often hinders real-time implementation. This paper is thus proposed to develop a framework for real-time charging control based on explicit MPC (eMPC), exploiting its advantage in characterizing an explicit solution to an MPC problem, to enable real-time charging control. The study begins with the formulation of MPC charging based on a nonlinear equivalent circuit model. Then, multi-segment linearization is conducted to the original model, and applying the eMPC design to the obtained linear models leads to a charging control algorithm. The proposed algorithm shifts the constrained optimization to offline by precomputing explicit solutions to the charging problem and expressing the charging law as piecewise affine functions. This drastically reduces not only the online computational costs in the control run but also the difficulty of coding. Extensive numerical simulation and experimental results verify the effectiveness of the proposed eMPC charging control framework and algorithm. The research results can potentially meet the needs for real-time battery management running on embedded hardware. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:2003.06504 [pdf, other]

doi 10.1016/j.est.2020.101282

One-Shot Parameter Identification of the Thevenin's Model for Batteries: Methods and Validation

Authors: Ning Tian, Yebin Wang, Jian Chen, Huazhen Fang

Abstract: Parameter estimation is of foundational importance for various model-based battery management tasks, including charging control, state-of-charge estimation and aging assessment. However, it remains a challenging issue as the existing methods generally depend on cumbersome and time-consuming procedures to extract battery parameters from data. Departing from the literature, this paper sets the uniqu… ▽ More Parameter estimation is of foundational importance for various model-based battery management tasks, including charging control, state-of-charge estimation and aging assessment. However, it remains a challenging issue as the existing methods generally depend on cumbersome and time-consuming procedures to extract battery parameters from data. Departing from the literature, this paper sets the unique aim of identifying all the parameters offline in a one-shot procedure, including the resistance and capacitance parameters and the parameters in the parameterized function map** from the state-of-charge to the open-circuit voltage. Considering the well-known Thevenin's battery model, the study begins with the parameter identifiability analysis, showing that all the parameters are locally identifiable. Then, it formulates the parameter identification problem in a prediction-error-minimization framework. As the non-convexity intrinsic to the problem may lead to physically meaningless estimates, two methods are developed to overcome this issue. The first one is to constrain the parameter search within a reasonable space by setting parameter bounds, and the other adopts regularization of the cost function using prior parameter guess. The proposed identifiability analysis and identification methods are extensively validated through simulations and experiments. △ Less

Submitted 13 March, 2020; originally announced March 2020.

arXiv:1906.04150 [pdf, other]

doi 10.1109/TCST.2020.2976036

Nonlinear Double-Capacitor Model for Rechargeable Batteries: Modeling, Identification and Validation

Authors: Ning Tian, Huazhen Fang, Jian Chen, Yebin Wang

Abstract: This paper proposes a new equivalent circuit model for rechargeable batteries by modifying a double-capacitor model proposed in [1]. It is known that the original model can address the rate capacity effect and energy recovery effect inherent to batteries better than other models. However, it is a purely linear model and includes no representation of a battery's nonlinear phenomena. Hence, this wor… ▽ More This paper proposes a new equivalent circuit model for rechargeable batteries by modifying a double-capacitor model proposed in [1]. It is known that the original model can address the rate capacity effect and energy recovery effect inherent to batteries better than other models. However, it is a purely linear model and includes no representation of a battery's nonlinear phenomena. Hence, this work transforms the original model by introducing a nonlinear-map**-based voltage source and a serial RC circuit. The modification is justified by an analogy with the single-particle model. Two parameter estimation approaches, termed 1.0 and 2.0, are designed for the new model to deal with the scenarios of constant-current and variable-current charging/discharging, respectively. In particular, the 2.0 approach proposes the notion of Wiener system identification based on maximum a posteriori estimation, which allows all the parameters to be estimated in one shot while overcoming the nonconvexity or local minima issue to obtain physically reasonable estimates. An extensive experimental evaluation shows that the proposed model offers excellent accuracy and predictive capability. A comparison against the Rint and Thevenin models further points to its superiority. With high fidelity and low mathematical complexity, this model is beneficial for various real-time battery management applications. △ Less

Submitted 12 March, 2020; v1 submitted 10 June, 2019; originally announced June 2019.

Journal ref: IEEE Transactions on Control Systems Technology, 2020

arXiv:1810.09849 [pdf, other]

DropFilter: Dropout for Convolutions

Authors: Zhengsu Chen Jianwei Niu Qi Tian

Abstract: Using a large number of parameters , deep neural networks have achieved remarkable performance on computer vison and natural language processing tasks. However the networks usually suffer from overfitting by using too much parameters. Dropout is a widely use method to deal with overfitting. Although dropout can significantly regularize densely connected layers in neural networks, it leads to subop… ▽ More Using a large number of parameters , deep neural networks have achieved remarkable performance on computer vison and natural language processing tasks. However the networks usually suffer from overfitting by using too much parameters. Dropout is a widely use method to deal with overfitting. Although dropout can significantly regularize densely connected layers in neural networks, it leads to suboptimal results when using for convolutional layers. To track this problem, we propose DropFilter, a new dropout method for convolutional layers. DropFilter randomly suppresses the outputs of some filters. Because it is observed that co-adaptions are more likely to occurs inter filters rather than intra filters in convolutional layers. Using DropFilter, we remarkably improve the performance of convolutional networks on CIFAR and ImageNet. △ Less

Submitted 23 October, 2018; originally announced October 2018.

arXiv:1809.06716 [pdf, other]

A Fog Robotic System for Dynamic Visual Servoing

Authors: Nan Tian, **fa Chen, Mas Ma, Robert Zhang, Bill Huang, Ken Goldberg, Somayeh Sojoudi

Abstract: Cloud Robotics is a paradigm where distributed robots are connected to cloud services via networks to access unlimited computation power, at the cost of network communication. However, due to limitations such as network latency and variability, it is difficult to control dynamic, human compliant service robots directly from the cloud. In this work, by leveraging asynchronous protocol with a heartb… ▽ More Cloud Robotics is a paradigm where distributed robots are connected to cloud services via networks to access unlimited computation power, at the cost of network communication. However, due to limitations such as network latency and variability, it is difficult to control dynamic, human compliant service robots directly from the cloud. In this work, by leveraging asynchronous protocol with a heartbeat signal, we combine cloud robotics with a smart edge device to build a Fog Robotic system. We use the system to enable robust teleoperation of a dynamic self-balancing robot from the cloud. We first use the system to pick up boxes from static locations, a task commonly performed in warehouse logistics. To make cloud teleoperation more efficient, we deploy image based visual servoing (IBVS) to perform box pickups automatically. Visual feedbacks, including apriltag recognition and tracking, are performed in the cloud to emulate a Fog Robotic object recognition system for IBVS. We demonstrate the feasibility of real-time dynamic automation system using this cloud-edge hybrid, which opens up possibilities of deploying dynamic robotic control with deep-learning recognition systems in Fog Robotics. Finally, we show that Fog Robotics enables the self-balancing service robot to pick up a box automatically from a person under unstructured environments. △ Less

Submitted 16 September, 2018; originally announced September 2018.

Comments: 7 pages, 5 figures, ICRA 2019 (submitted, under review)

arXiv:1712.01406 [pdf, other]

Nonlinear Bayesian Estimation: From Kalman Filtering to a Broader Horizon

Authors: Huazhen Fang, Ning Tian, Yebin Wang, MengChu Zhou, Mulugeta A. Haile

Abstract: This article presents an up-to-date tutorial review of nonlinear Bayesian estimation. State estimation for nonlinear systems has been a challenge encountered in a wide range of engineering fields, attracting decades of research effort. To date, one of the most promising and popular approaches is to view and address the problem from a Bayesian probabilistic perspective, which enables estimation of… ▽ More This article presents an up-to-date tutorial review of nonlinear Bayesian estimation. State estimation for nonlinear systems has been a challenge encountered in a wide range of engineering fields, attracting decades of research effort. To date, one of the most promising and popular approaches is to view and address the problem from a Bayesian probabilistic perspective, which enables estimation of the unknown state variables by tracking their probabilistic distribution or statistics (e.g., mean and covariance) conditioned on the system's measurement data. This article offers a systematic introduction of the Bayesian state estimation framework and reviews various Kalman filtering (KF) techniques, progressively from the standard KF for linear systems to extended KF, unscented KF and ensemble KF for nonlinear systems. It also overviews other prominent or emerging Bayesian estimation methods including the Gaussian filtering, Gaussian-sum filtering, particle filtering and moving horizon estimation and extends the discussion of state estimation forward to more complicated problems such as simultaneous state and parameter/input estimation. △ Less

Submitted 14 December, 2017; v1 submitted 4 December, 2017; originally announced December 2017.

arXiv:1709.08819 [pdf, other]

Three-dimensional Temperature Field Reconstruction for A Lithium-Ion Battery Pack: A Distributed Kalman Filtering Approach

Authors: Ning Tian, Huazhen Fang, Yebin Wang

Abstract: Despite the ever-increasing use across different sectors, the lithium-ion batteries (LiBs) have continually seen serious concerns over their thermal vulnerability. The LiB operation is associated with the heat generation and buildup effect, which manifests itself more strongly, in the form of highly uneven thermal distribution, for a LiB pack consisting of multiple cells. If not well monitored and… ▽ More Despite the ever-increasing use across different sectors, the lithium-ion batteries (LiBs) have continually seen serious concerns over their thermal vulnerability. The LiB operation is associated with the heat generation and buildup effect, which manifests itself more strongly, in the form of highly uneven thermal distribution, for a LiB pack consisting of multiple cells. If not well monitored and managed, the heating may accelerate aging and cause unwanted side reactions. In extreme cases, it will even cause fires and explosions, as evidenced by a series of well-publicized incidents in recent years. To address this threat, this paper, for the first time, seeks to reconstruct the three-dimensional temperature field of a LiB pack in real time. The major challenge lies in how to acquire a high-fidelity reconstruction with constrained computation time. In this study, a three-dimensional thermal model is established first for a LiB pack configured in series. Although spatially resolved, this model captures spatial thermal behavior with a combination of high integrity and low complexity. Given the model, the standard Kalman filter is then distributed to attain temperature field estimation at substantially reduced computational complexity. The arithmetic operation analysis and numerical simulation illustrate that the proposed distributed estimation achieves a comparable accuracy as the centralized approach but with much less computation. This work can potentially contribute to the safer operation of the LiB packs in various systems dependent on LiB-based energy storage, potentially widening the access of this technology to a broader range of engineering areas. △ Less

Submitted 9 August, 2017; originally announced September 2017.

arXiv:1610.06469 [pdf, other]

Fast and guaranteed blind multichannel deconvolution under a bilinear system model

Authors: Kiryung Lee, Ning Tian, Justin Romberg

Abstract: We consider the multichannel blind deconvolution problem where we observe the output of multiple channels that are all excited with the same unknown input. From these observations, we wish to estimate the impulse responses of each of the channels. We show that this problem is well-posed if the channels follow a bilinear model where the ensemble of channel responses is modeled as lying in a low-dim… ▽ More We consider the multichannel blind deconvolution problem where we observe the output of multiple channels that are all excited with the same unknown input. From these observations, we wish to estimate the impulse responses of each of the channels. We show that this problem is well-posed if the channels follow a bilinear model where the ensemble of channel responses is modeled as lying in a low-dimensional subspace but with each channel modulated by an independent gain. Under this model, we show how the channel estimates can be found by minimizing a quadratic functional over a non-convex set. We analyze two methods for solving this non-convex program, and provide performance guarantees for each. The first is a method of alternating eigenvectors that breaks the program down into a series of eigenvalue problems. The second is a truncated power iteration, which can roughly be interpreted as a method for finding the largest eigenvector of a symmetric matrix with the additional constraint that it adheres to our bilinear model. As with most non-convex optimization algorithms, the performance of both of these algorithms is highly dependent on having a good starting point. We show how such a starting point can be constructed from the channel measurements. Our performance guarantees are non-asymptotic, and provide a sufficient condition on the number of samples observed per channel in order to guarantee channel estimates of a certain accuracy. Our analysis uses a model with a "generic" subspace that is drawn at random, and we show the performance bounds hold with high probability. Mathematically, the key estimates are derived by quantifying how well the eigenvectors of certain random matrices approximate the eigenvectors of their mean. We also present a series of numerical results demonstrating that the empirical performance is consistent with the presented theory. △ Less

Submitted 2 October, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

arXiv:1609.09398

Analytic Solution to Geodesic Equations in Lemaître-Tolman-Bondi Metric

Authors: Nieng Tian

Abstract: In this paper, we use the Taylor expansion method to solve the LTB metric. In this paper, we use the Taylor expansion method to solve the LTB metric. △ Less

Submitted 19 May, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

Comments: The conclusion is wrong

Showing 1–14 of 14 results for author: Tian, N