Skip to main content

Showing 1–28 of 28 results for author: Ruan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2405.14333  [pdf, other

    cs.AI

    DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Authors: Huajian Xin, Daya Guo, Zhihong Shao, Zhizhou Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang

    Abstract: Proof assistants like Lean have revolutionized mathematical proof verification, ensuring high accuracy and reliability. Although large language models (LLMs) show promise in mathematical reasoning, their advancement in formal theorem proving is hindered by a lack of training data. To address this issue, we introduce an approach to generate extensive Lean 4 proof data derived from high-school and u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2404.09151  [pdf, other

    cs.SE cs.LG

    Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development

    Authors: Siyuan Feng, Jiawei Liu, Ruihang Lai, Charlie F. Ruan, Yong Yu, Lingming Zhang, Tianqi Chen

    Abstract: Deploying machine learning (ML) on diverse computing platforms is crucial to accelerate and broaden their applications. However, it presents significant software engineering challenges due to the fast evolution of models, especially the recent Large Language Models (LLMs), and the emergence of new computing platforms. Current ML frameworks are primarily engineered for CPU and CUDA platforms, leavi… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  5. arXiv:2403.05525  [pdf, other

    cs.AI

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Authors: Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, **gxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan

    Abstract: We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive represe… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: https://github.com/deepseek-ai/DeepSeek-VL

  6. arXiv:2401.06066  [pdf, other

    cs.CL

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Authors: Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, Wenfeng Liang

    Abstract: In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model parameters. However, conventional MoE architectures like GShard, which activate the top-$K$ out of $N$ experts, face challenges in ensuring expert specialization, i.e. each expert acquires non-overlap** and focused knowledge. In response, we propose the… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  7. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  8. arXiv:2309.06123  [pdf, ps, other

    cs.CV

    Dynamic Visual Prompt Tuning for Parameter Efficient Transfer Learning

    Authors: Chunqing Ruan, Hongjian Wang

    Abstract: Parameter efficient transfer learning (PETL) is an emerging research spot that aims to adapt large-scale pre-trained models to downstream tasks. Recent advances have achieved great success in saving storage and computation costs. However, these methods do not take into account instance-specific visual clues for visual tasks. In this paper, we propose a Dynamic Visual Prompt Tuning framework (DVPT)… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: accepted by 2023 PRCV

  9. arXiv:2302.00845  [pdf, other

    cs.LG cs.DC math.OC

    Coordinating Distributed Example Orders for Provably Accelerated Training

    Authors: A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F. Ruan, Yucheng Lu, Christopher De Sa

    Abstract: Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: whil… ▽ More

    Submitted 21 December, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  10. arXiv:2211.15886  [pdf, other

    cs.LG cs.MA math.OC

    Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space

    Authors: Charlie Ruan

    Abstract: Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties whe… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  11. arXiv:2211.14420   

    cs.CV cs.AI cs.LG

    Photo Rater: Photographs Auto-Selector with Deep Learning

    Authors: Wentao Guo, Charlie Ruan, Claire Zhou

    Abstract: Photo Rater is a computer vision project that uses neural networks to help photographers select the best photo among those that are taken based on the same scene. This process is usually referred to as "culling" in photography, and it can be tedious and time-consuming if done manually. Photo Rater utilizes three separate neural networks to complete such a task: one for general image quality assess… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: The authors discovered issues in the code that produced figures 8 and 9

  12. arXiv:2210.06575  [pdf, other

    cs.RO cs.CV

    GraspNeRF: Multiview-based 6-DoF Grasp Detection for Transparent and Specular Objects Using Generalizable NeRF

    Authors: Qiyu Dai, Yan Zhu, Yiran Geng, Ciyu Ruan, Jiazhao Zhang, He Wang

    Abstract: In this work, we tackle 6-DoF grasp detection for transparent and specular objects, which is an important yet challenging problem in vision-based robotic systems, due to the failure of depth cameras in sensing their geometry. We, for the first time, propose a multiview RGB-based 6-DoF grasp detection network, GraspNeRF, that leverages the generalizable neural radiance field (NeRF) to achieve mater… ▽ More

    Submitted 15 March, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: IEEE International Conference on Robotics and Automation (ICRA), 2023

  13. Tutorial: Modern Theoretical Tools for Understanding and Designing Next-generation Information Retrieval System

    Authors: Da Xu, Chuanwei Ruan

    Abstract: In the relatively short history of machine learning, the subtle balance between engineering and theoretical progress has been proved critical at various stages. The most recent wave of AI has brought to the IR community powerful techniques, particularly for pattern recognition. While many benefits from the burst of ideas as numerous tasks become algorithmically feasible, the balance is tilting tow… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  14. arXiv:2203.13956  [pdf, other

    cs.IR

    From Intervention to Domain Transportation: A Novel Perspective to Optimize Recommendation

    Authors: Da Xu, Yuting Ye, Chuanwei Ruan

    Abstract: The interventional nature of recommendation has attracted increasing attention in recent years. It particularly motivates researchers to formulate learning and evaluating recommendation as causal inference and data missing-not-at-random problems. However, few take seriously the consequence of violating the critical assumption of overlap**, which we prove can significantly threaten the validity a… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  15. arXiv:2202.13337  [pdf, other

    cs.LG cs.IR

    Towards Robust Off-policy Learning for Runtime Uncertainty

    Authors: Da Xu, Yuting Ye, Chuanwei Ruan, Bo Yang

    Abstract: Off-policy learning plays a pivotal role in optimizing and evaluating policies prior to the online deployment. However, during the real-time serving, we observe varieties of interventions and constraints that cause inconsistency between the online and offline settings, which we summarize and term as runtime uncertainty. Such uncertainty cannot be learned from the logged data due to its abnormality… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

    Comments: 21 pages, 9 figures, 2 tables; accepted by AAAI 2022

  16. arXiv:2201.10671  [pdf, other

    cs.RO cs.HC

    Using Design Metaphors to Understand User Expectations of Socially Interactive Robot Embodiments

    Authors: Nathaniel Dennler, Changxiao Ruan, Jessica Hadiwijoyo, Brenna Chen, Stefanos Nikolaidis, Maja Mataric

    Abstract: The physical design of a robot suggests expectations of that robot's functionality for human users and collaborators. When those expectations align with the true capabilities of the robot, interaction with the robot is enhanced. However, misalignment of those expectations can result in an unsatisfying interaction. This paper uses Mechanical Turk to evaluate user expectation through the use of desi… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: 33 pages, 16 figures, 6 tables

    Journal ref: J. Hum.-Robot Interact. 12, 2, Article 21 (June 2023), 41 pages

  17. arXiv:2110.12141  [pdf, other

    cs.IR cs.LG

    Rethinking Neural vs. Matrix-Factorization Collaborative Filtering: the Theoretical Perspectives

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: The recent work by Rendle et al. (2020), based on empirical observations, argues that matrix-factorization collaborative filtering (MCF) compares favorably to neural collaborative filtering (NCF), and conjectures the dot product's superiority over the feed-forward neural network as similarity function. In this paper, we address the comparison rigorously by answering the following questions: 1. wha… ▽ More

    Submitted 23 October, 2021; originally announced October 2021.

  18. arXiv:2110.12132  [pdf, other

    cs.IR cs.LG stat.ML

    Towards the D-Optimal Online Experiment Design for Recommender Selection

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Selecting the optimal recommender via online exploration-exploitation is catching increasing attention where the traditional A/B testing can be slow and costly, and offline evaluations are prone to the bias of history data. Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward. While the p… ▽ More

    Submitted 25 March, 2022; v1 submitted 23 October, 2021; originally announced October 2021.

  19. arXiv:2103.15213  [pdf, other

    cs.LG

    A Temporal Kernel Approach for Deep Learning with Continuous-time Information

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Sequential deep learning models such as RNN, causal CNN and attention mechanism do not readily consume continuous-time information. Discretizing the temporal data, as we show, causes inconsistency even for simple continuous-time processes. Current approaches often handle time in a heuristic manner to be consistent with the existing deep learning architectures and implementations. In this paper, we… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

  20. arXiv:2103.15209  [pdf, other

    cs.LG

    Understanding the role of importance weighting for deep learning

    Authors: Da Xu, Yuting Ye, Chuanwei Ruan

    Abstract: The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon… ▽ More

    Submitted 28 March, 2021; originally announced March 2021.

  21. arXiv:2102.12029  [pdf, other

    cs.LG cs.IR

    Theoretical Understandings of Product Embedding for E-commerce Machine Learning

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Product embeddings have been heavily investigated in the past few years, serving as the cornerstone for a broad range of machine learning applications in e-commerce. Despite the empirical success of product embeddings, little is known on how and why they work from the theoretical standpoint. Analogous results from the natural language processing (NLP) often rely on domain-specific properties that… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

  22. arXiv:2012.02295  [pdf, other

    cs.IR cs.LG stat.ML

    Adversarial Counterfactual Learning and Evaluation for Recommender System

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: The feedback data of recommender systems are often subject to what was exposed to the users; however, most learning and evaluation methods do not account for the underlying exposure mechanism. We first show in theory that applying supervised learning to detect user preferences may end up with inconsistent results in the absence of exposure information. The counterfactual propensity-weighting appro… ▽ More

    Submitted 7 November, 2020; originally announced December 2020.

  23. arXiv:2002.07962  [pdf, other

    cs.LG stat.ML

    Inductive Representation Learning on Temporal Graphs

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Inductive representation learning on temporal graphs is an important step toward salable machine learning on real-world dynamic networks. The evolving nature of temporal dynamic graphs requires handling new nodes as well as capturing temporal patterns. The node embeddings, which are now functions of time, should represent both the static node features and the evolving topological structures. Moreo… ▽ More

    Submitted 18 February, 2020; originally announced February 2020.

  24. arXiv:1911.12864  [pdf, other

    cs.LG stat.ML

    Self-attention with Functional Time Representation Learning

    Authors: Da Xu, Chuanwei Ruan, Sushant Kumar, Evren Korpeoglu, Kannan Achan

    Abstract: Sequential modelling with self-attention has achieved cutting edge performances in natural language processing. With advantages in model flexibility, computation complexity and interpretability, self-attention is gradually becoming a key component in event sequence models. However, like most other sequence models, self-attention does not account for the time span between events and thus captures s… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

  25. arXiv:1911.12481  [pdf, other

    cs.LG cs.IR stat.ML

    Product Knowledge Graph Embedding for E-commerce

    Authors: Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: In this paper, we propose a new product knowledge graph (PKG) embedding approach for learning the intrinsic product relations as product knowledge for e-commerce. We define the key entities and summarize the pivotal product relations that are critical for general e-commerce applications including marketing, advertisement, search ranking and recommendation. We first provide a comprehensive comparis… ▽ More

    Submitted 27 November, 2019; originally announced November 2019.

  26. arXiv:1904.12574  [pdf, other

    cs.IR cs.LG stat.ML

    Knowledge-aware Complementary Product Representation Learning

    Authors: Da Xu, Chuanwei Ruan, Jason Cho, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Learning product representations that reflect complementary relationship plays a central role in e-commerce recommender system. In the absence of the product relationships graph, which existing methods rely on, there is a need to detect the complementary relationships directly from noisy and sparse customer purchase activities. Furthermore, unlike simple relationships such as similarity, complemen… ▽ More

    Submitted 28 November, 2019; v1 submitted 15 March, 2019; originally announced April 2019.

  27. arXiv:1903.02640  [pdf, other

    cs.LG cs.SI stat.ML

    Generative Graph Convolutional Network for Growing Graphs

    Authors: Da Xu, Chuanwei Ruan, Kamiya Motwani, Evren Korpeoglu, Sushant Kumar, Kannan Achan

    Abstract: Modeling generative process of growing graphs has wide applications in social networks and recommendation systems, where cold start problem leads to new nodes isolated from existing graph. Despite the emerging literature in learning graph representation and graph generation, most of them can not handle isolated new nodes without nontrivial modifications. The challenge arises due to the fact that l… ▽ More

    Submitted 6 March, 2019; originally announced March 2019.

  28. arXiv:1602.03563  [pdf

    cs.NI

    QoS Evaluation of Heterogeneous Networks: Application-based Approach

    Authors: Farnaz Farid, Seyed Shahrestani, Chun Ruan

    Abstract: In this paper, an application-based QoS evaluation approach for heterogeneous networks is proposed.It is possible to expand the network capacity and coverage in a dynamic fashion by applying heterogeneous wireless network architecture. However, the Quality of Service (QoS) evaluation of this type of network architecture is very challenging due to the presence of different communication technologie… ▽ More

    Submitted 10 February, 2016; originally announced February 2016.

    Journal ref: International Journal of Computer Networks & Communications (IJCNC) Vol.8, No.1, January 2016