Skip to main content

Showing 1–50 of 60 results for author: Dong, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.06777  [pdf, other

    cs.CV cs.AI

    MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

    Authors: Khiem Le, Zhichun Guo, Kaiwen Dong, Xiaobao Huang, Bozhao Nan, Roshni Iyer, Xiangliang Zhang, Olaf Wiest, Wei Wang, Nitesh V. Chawla

    Abstract: Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehend… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  3. arXiv:2405.18727  [pdf, other

    cs.CL cs.AI cs.IR

    CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control

    Authors: Huanshuo Liu, Hao Zhang, Zhijiang Guo, Kuicai Dong, Xiangyang Li, Yi Quan Lee, Cong Zhang, Yong Liu

    Abstract: Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures, 9 tables

  4. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2404.15103  [pdf, other

    cs.CL

    Multi-view Content-aware Indexing for Long Document Retrieval

    Authors: Kuicai Dong, Derrick Goh Xin Deik, Yi Quan Lee, Hao Zhang, Xiangyang Li, Cong Zhang, Yong Liu

    Abstract: Long document question answering (DocQA) aims to answer questions from long documents over 10k words. They usually contain content structures such as sections, sub-sections, and paragraph demarcations. However, the indexing methods of long documents remain under-explored, while existing systems generally employ fixed-length chunking. As they do not consider content structures, the resultant chunks… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  6. arXiv:2404.13600  [pdf, other

    cs.RO

    Are We Ready for Planetary Exploration Robots? The TAIL-Plus Dataset for SLAM in Granular Environments

    Authors: Zirui Wang, Chen Yao, Yangtao Ge, Guowei Shi, Ningbo Yang, Zheng Zhu, Kewei Dong, Hexiang Wei, Zhenzhong Jia, **g Wu

    Abstract: So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and map** capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots,… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted to the IEEE ICRA Workshop on Field Robotics 2024

  7. arXiv:2404.11032  [pdf, other

    cs.LG cs.SI

    CORE: Data Augmentation for Link Prediction via Information Bottleneck

    Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla

    Abstract: Link prediction (LP) is a fundamental task in graph representation learning, with numerous applications in diverse domains. However, the generalizability of LP models is often compromised due to the presence of noisy or spurious information in graphs and the inherent incompleteness of graph data. To address these challenges, we draw inspiration from the Information Bottleneck principle and propose… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  8. arXiv:2404.11019  [pdf, other

    cs.LG

    You do not have to train Graph Neural Networks at all on text-attributed graphs

    Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla

    Abstract: Graph structured data, specifically text-attributed graphs (TAG), effectively represent relationships among varied entities. Such graphs are essential for semi-supervised node classification tasks. Graph Neural Networks (GNNs) have emerged as a powerful tool for handling this graph-structured data. Although gradient descent is commonly utilized for training GNNs for node classification, this study… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: preprint

  9. arXiv:2404.10584  [pdf, other

    cs.CV

    ReWiTe: Realistic Wide-angle and Telephoto Dual Camera Fusion Dataset via Beam Splitter Camera Rig

    Authors: Chunli Peng, Xuan Dong, Tiantian Cao, Zhengqing Li, Kun Dong, Weixin Li

    Abstract: The fusion of images from dual camera systems featuring a wide-angle and a telephoto camera has become a hotspot problem recently. By integrating simultaneously captured wide-angle and telephoto images from these systems, the resulting fused image achieves a wide field of view (FOV) coupled with high-definition quality. Existing approaches are mostly deep learning methods, and predominantly rely o… ▽ More

    Submitted 29 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  10. arXiv:2404.01356  [pdf, other

    cs.LG cs.AI cs.CY

    The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness

    Authors: Xuran Li, Peng Wu, Yanting Chen, Xingjun Ma, Zhen Zhang, Kaixiang Dong

    Abstract: Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  11. arXiv:2404.00702  [pdf, other

    cs.IR

    Tired of Plugins? Large Language Models Can Be End-To-End Recommenders

    Authors: Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang

    Abstract: Recommender systems aim to predict user interest based on historical behavioral data. They are mainly designed in sequential pipelines, requiring lots of data to train different sub-systems, and are hard to scale to new domains. Recently, Large Language Models (LLMs) have demonstrated remarkable generalized capabilities, enabling a singular model to tackle diverse recommendation tasks across vario… ▽ More

    Submitted 7 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  12. arXiv:2403.16037  [pdf, other

    cs.IR

    Knowledge-aware Dual-side Attribute-enhanced Recommendation

    Authors: Taotian Pang, Xingyu Lou, Fei Zhao, Zhen Wu, Kuiyao Dong, Qiuying Peng, Yue Qi, Xinyu Dai

    Abstract: \textit{Knowledge-aware} recommendation methods (KGR) based on \textit{graph neural networks} (GNNs) and \textit{contrastive learning} (CL) have achieved promising performance. However, they fall short in modeling fine-grained user preferences and further fail to leverage the \textit{preference-attribute connection} to make predictions, leading to sub-optimal performance. To address the issue, we… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  13. arXiv:2403.05525  [pdf, other

    cs.AI

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Authors: Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, **gxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan

    Abstract: We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive represe… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: https://github.com/deepseek-ai/DeepSeek-VL

  14. arXiv:2402.09764  [pdf, other

    cs.AI

    Aligning Crowd Feedback via Distributional Preference Reward Modeling

    Authors: Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong Liu

    Abstract: Deep Reinforcement Learning is widely used for aligning Large Language Models (LLM) with human preference. However, the conventional reward modelling is predominantly dependent on human annotations provided by a select cohort of individuals. Such dependence may unintentionally result in skewed models that reflect the inclinations of these annotators, thereby failing to adequately represent the wid… ▽ More

    Submitted 30 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  15. arXiv:2402.09711  [pdf, other

    cs.LG cs.SI

    Node Duplication Improves Cold-start Link Prediction

    Authors: Zhichun Guo, Tong Zhao, Yozen Liu, Kaiwen Dong, William Shiao, Neil Shah, Nitesh V. Chawla

    Abstract: Graph Neural Networks (GNNs) are prominent in graph machine learning and have shown state-of-the-art performance in Link Prediction (LP) tasks. Nonetheless, recent studies show that GNNs struggle to produce good results on low-degree nodes despite their overall strong performance. In practical applications of LP, like recommendation systems, improving performance on low-degree nodes is critical, a… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  16. arXiv:2402.07738  [pdf, other

    cs.LG

    Universal Link Predictor By In-Context Learning on Graphs

    Authors: Kaiwen Dong, Haitao Mao, Zhichun Guo, Nitesh V. Chawla

    Abstract: Link prediction is a crucial task in graph machine learning, where the goal is to infer missing or future links within a graph. Traditional approaches leverage heuristic methods based on widely observed connectivity patterns, offering broad applicability and generalizability without the need for model training. Despite their utility, these methods are limited by their reliance on human-derived heu… ▽ More

    Submitted 15 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Preprint

  17. arXiv:2402.01777  [pdf

    cs.CL cs.AI cs.HC

    On the Psychology of GPT-4: Moderately anxious, slightly masculine, honest, and humble

    Authors: Adrita Barua, Gary Brase, Ke Dong, Pascal Hitzler, Eugene Vasserman

    Abstract: We subject GPT-4 to a number of rigorous psychometric tests and analyze the results. We find that, compared to the average human, GPT-4 tends to show more honesty and humility, and less machiavellianism and narcissism. It sometimes exhibits ambivalent sexism, leans slightly toward masculinity, is moderately anxious but mostly not depressive (but not always). It shows human-average numerical litera… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 16 pages, 8 tables, 1 code repository

  18. arXiv:2401.14196  [pdf, other

    cs.SE cs.CL cs.LG

    DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

    Authors: Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, Wenfeng Liang

    Abstract: The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-train… ▽ More

    Submitted 26 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  19. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  20. arXiv:2312.10743  [pdf, other

    cs.IR

    A Unified Framework for Multi-Domain CTR Prediction via Large Language Models

    Authors: Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, Ruiming Tang

    Abstract: Click-Through Rate (CTR) prediction is a crucial task in online recommendation platforms as it involves estimating the probability of user engagement with advertisements or items by clicking on them. Given the availability of various services like online shop**, ride-sharing, food delivery, and professional services on commercial platforms, recommendation systems in these platforms are required… ▽ More

    Submitted 23 February, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: submited to TOIS

  21. arXiv:2311.09711  [pdf, other

    cs.IT

    Second-order Rate Analysis of a Two-user Gaussian Interference Channel with Heterogeneous Blocklength Constraints

    Authors: Kailun Dong, Pin-Hsun Lin, Marcel Mross, Eduard A. Jorswieck

    Abstract: We consider a two-user Gaussian interference channel with heterogeneous blocklength constraints (HB-GIC), strong interference, and two private messages. We propose to apply the successive interference cancellation with early decoding, i.e., decoding a message with a number of received symbols less than the blocklength at the receiver. We determine the necessary number of received symbols to achiev… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 4 figures

  22. arXiv:2309.00976  [pdf, other

    cs.LG cs.IR cs.SI

    Pure Message Passing Can Estimate Common Neighbor for Link Prediction

    Authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla

    Abstract: Message Passing Neural Networks (MPNNs) have emerged as the {\em de facto} standard in graph representation learning. However, when it comes to link prediction, they often struggle, surpassed by simple heuristics such as Common Neighbor (CN). This discrepancy stems from a fundamental limitation: while MPNNs excel in node-level representation, they stumble with encoding the joint structural feature… ▽ More

    Submitted 23 January, 2024; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: preprint

  23. FoodSAM: Any Food Segmentation

    Authors: Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue

    Abstract: In this paper, we explore the zero-shot capability of the Segment Anything Model (SAM) for food image segmentation. To address the lack of class-specific information in SAM-generated masks, we propose a novel framework, called FoodSAM. This innovative approach integrates the coarse semantic mask with SAM-generated masks to enhance semantic segmentation quality. Besides, we recognize that the ingre… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Code is available at https://github.com/jamesjg/FoodSAM

  24. arXiv:2308.00187  [pdf, ps, other

    cs.RO cs.CV eess.SP

    Detecting the Anomalies in LiDAR Pointcloud

    Authors: Chiyu Zhang, Ji Han, Yao Zou, Kexin Dong, Yujia Li, Junchun Ding, Xiaoling Han

    Abstract: LiDAR sensors play an important role in the perception stack of modern autonomous driving systems. Adverse weather conditions such as rain, fog and dust, as well as some (occasional) LiDAR hardware fault may cause the LiDAR to produce pointcloud with abnormal patterns such as scattered noise points and uncommon intensity values. In this paper, we propose a novel approach to detect whether a LiDAR… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

  25. arXiv:2306.16361  [pdf, ps, other

    cs.LG

    Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time

    Authors: Arvind Mahankali, Jeff Z. Haochen, Kefan Dong, Margalit Glasgow, Tengyu Ma

    Abstract: Despite recent theoretical progress on the non-convex optimization of two-layer neural networks, it is still an open question whether gradient descent on neural networks without unnatural modifications can achieve better sample complexity than kernel methods. This paper provides a clean mean-field analysis of projected gradient flow on polynomial-width two-layer neural networks. Different from pri… ▽ More

    Submitted 7 October, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

    Comments: Added result on projected gradient descent with inverse-polynomial learning rate

  26. arXiv:2306.08373  [pdf, other

    cs.CL cs.AI

    A semantically enhanced dual encoder for aspect sentiment triplet extraction

    Authors: Baoxing Jiang, Shehui Liang, Peiyu Liu, Kaifang Dong, Hongye Li

    Abstract: Aspect sentiment triplet extraction (ASTE) is a crucial subtask of aspect-based sentiment analysis (ABSA) that aims to comprehensively identify sentiment triplets. Previous research has focused on enhancing ASTE through innovative table-filling strategies. However, these approaches often overlook the multi-perspective nature of language expressions, resulting in a loss of valuable interaction info… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 25 pages, 4 figures

  27. arXiv:2306.04234  [pdf, other

    cs.IR cs.CY

    Set-to-Sequence Ranking-based Concept-aware Learning Path Recommendation

    Authors: Xianyu Chen, Jian Shen, Wei Xia, Jiarui **, Yakun Song, Weinan Zhang, Weiwen Liu, Menghui Zhu, Ruiming Tang, Kai Dong, Dingyin Xia, Yong Yu

    Abstract: With the development of the online education system, personalized education recommendation has played an essential role. In this paper, we focus on develo** path recommendation systems that aim to generating and recommending an entire learning path to the given user in each session. Noticing that existing approaches fail to consider the correlations of concepts in the path, we propose a novel fr… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  28. arXiv:2305.10906  [pdf, other

    cs.LG cs.AI cs.CY

    RobustFair: Adversarial Evaluation through Fairness Confusion Directed Gradient Search

    Authors: Xuran Li, Peng Wu, Kaixiang Dong, Zhen Zhang, Yanting Chen

    Abstract: Deep neural networks (DNNs) often face challenges due to their vulnerability to various adversarial perturbations, including false perturbations that undermine prediction accuracy and biased perturbations that cause biased predictions for similar inputs. This paper introduces a novel approach, RobustFair, to evaluate the accurate fairness of DNNs when subjected to these false or biased perturbatio… ▽ More

    Submitted 8 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  29. arXiv:2305.04181  [pdf, other

    cs.CL cs.AI

    Shall We Trust All Relational Tuples by Open Information Extraction? A Study on Speculation Detection

    Authors: Kuicai Dong, Aixin Sun, Jung-Jae Kim, Xiaoli Li

    Abstract: Open Information Extraction (OIE) aims to extract factual relational tuples from open-domain sentences. Downstream tasks use the extracted OIE tuples as facts, without examining the certainty of these facts. However, uncertainty/speculation is a common linguistic phenomenon. Existing studies on speculation detection are defined at sentence level, but even if a sentence is determined to be speculat… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

  30. arXiv:2305.03299  [pdf, other

    cs.CL cs.AI

    Open Information Extraction via Chunks

    Authors: Kuicai Dong, Aixin Sun, Jung-Jae Kim, Xiaoli Li

    Abstract: Open Information Extraction (OIE) aims to extract relational tuples from open-domain sentences. Existing OIE systems split a sentence into tokens and recognize token spans as tuple relations and arguments. We instead propose Sentence as Chunk sequence (SaC) and recognize chunk spans as tuple relations and arguments. We argue that SaC has better quantitative and qualitative properties for OIE than… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

  31. arXiv:2305.00322  [pdf, ps, other

    cs.LG

    Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields

    Authors: Kefan Dong, Tengyu Ma

    Abstract: Many machine learning applications require learning a function with a small worst-case error over the entire input domain, that is, the $L_\infty$-error, whereas most existing theoretical works only guarantee recovery in average errors such as the $L_2$-error. $L_\infty$-recovery from polynomial samples is even impossible for seemingly simple function classes such as constant-norm infinite-width t… ▽ More

    Submitted 29 April, 2023; originally announced May 2023.

    Comments: 39 pages

  32. arXiv:2301.11426  [pdf, other

    cs.LG

    Model-based Offline Reinforcement Learning with Local Misspecification

    Authors: Kefan Dong, Yannis Flet-Berliac, Allen Nie, Emma Brunskill

    Abstract: We present a model-based offline reinforcement learning policy performance lower bound that explicitly captures dynamics model misspecification and distribution mismatch and we propose an empirical algorithm for optimal offline policy selection. Theoretically, we prove a novel safe policy improvement theorem by establishing pessimism approximations to the value function. Our key insight is to join… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by AAAI-23

  33. arXiv:2212.02068  [pdf, other

    cs.CL cs.AI

    Syntactic Multi-view Learning for Open Information Extraction

    Authors: Kuicai Dong, Aixin Sun, Jung-Jae Kim, Xiaoli Li

    Abstract: Open Information Extraction (OpenIE) aims to extract relational tuples from open-domain sentences. Traditional rule-based or statistical models have been developed based on syntactic structures of sentences, identified by syntactic parsers. However, previous neural OpenIE models under-explore the useful syntactic information. In this paper, we model both constituency and dependency trees into word… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

    Comments: To appear in EMNLP 2022

    Journal ref: EMNLP 2022

  34. arXiv:2211.15899  [pdf, other

    cs.LG cs.SI stat.ML

    FakeEdge: Alleviate Dataset Shift in Link Prediction

    Authors: Kaiwen Dong, Yijun Tian, Zhichun Guo, Yang Yang, Nitesh V. Chawla

    Abstract: Link prediction is a crucial problem in graph-structured data. Due to the recent success of graph neural networks (GNNs), a variety of GNN-based models were proposed to tackle the link prediction task. Specifically, GNNs leverage the message passing paradigm to obtain node representation, which relies on link connectivity. However, in a link prediction task, links in the training set are always pr… ▽ More

    Submitted 3 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Accepted to Learning on Graph

  35. arXiv:2211.11719  [pdf, other

    cs.LG stat.ML

    First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains

    Authors: Kefan Dong, Tengyu Ma

    Abstract: Real-world machine learning applications often involve deploying neural networks to domains that are not seen in the training time. Hence, we need to understand the extrapolation of nonlinear models -- under what conditions on the distributions and function class, models can be guaranteed to extrapolate to new test distributions. The question is very challenging because even two-layer neural netwo… ▽ More

    Submitted 1 December, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: added citations and fixed typos

  36. arXiv:2208.09957  [pdf, other

    cs.LG

    Heterogeneous Graph Masked Autoencoders

    Authors: Yijun Tian, Kaiwen Dong, Chunhui Zhang, Chuxu Zhang, Nitesh V. Chawla

    Abstract: Generative self-supervised learning (SSL), especially masked autoencoders, has become one of the most exciting learning paradigms and has shown great potential in handling graph data. However, real-world graphs are always heterogeneous, which poses three critical challenges that existing methods ignore: 1) how to capture complex graph structure? 2) how to incorporate various node attributes? and 3… ▽ More

    Submitted 9 February, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: Accepted by AAAI 2023 (Oral)

  37. arXiv:2208.00843  [pdf, other

    cs.RO cs.AI

    Relay Hindsight Experience Replay: Self-Guided Continual Reinforcement Learning for Sequential Object Manipulation Tasks with Sparse Rewards

    Authors: Yongle Luo, Yuxin Wang, Kun Dong, Qiang Zhang, Erkang Cheng, Zhiyong Sun, Bo Song

    Abstract: Exploration with sparse rewards remains a challenging research problem in reinforcement learning (RL). Especially for sequential object manipulation tasks, the RL agent always receives negative rewards until completing all sub-tasks, which results in low exploration efficiency. To solve these tasks efficiently, we propose a novel self-guided continual RL framework, RelayHER (RHER). RHER first deco… ▽ More

    Submitted 3 November, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: 10 pages, 15 figures. https://github.com/kaixindelele/RHER

  38. Bio-inspired Intelligence with Applications to Robotics: A Survey

    Authors: Junfei Li, Zhe Xu, Danjie Zhu, Kevin Dong, Tao Yan, Zhu Zeng, Simon X. Yang

    Abstract: In the past decades, considerable attention has been paid to bio-inspired intelligence and its applications to robotics. This paper provides a comprehensive survey of bio-inspired intelligence, with a focus on neurodynamics approaches, to various robotic applications, particularly to path planning and control of autonomous robotic systems. Firstly, the bio-inspired shunting model and its variants… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  39. arXiv:2206.02326  [pdf, ps, other

    cs.LG

    Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

    Authors: Kefan Dong, Tengyu Ma

    Abstract: Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt to the complexity of a particular problem instance and incur smaller regrets on easy instances than worst-case instances. In this paper, we design the first asym… ▽ More

    Submitted 11 June, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: Accepted by ICLR 2023

  40. An Interpretable MRI Reconstruction Network with Two-grid-cycle Correction and Geometric Prior Distillation

    Authors: Xiaohong Fan, Yin Yang, Ke Chen, Jian** Zhang, Ke Dong

    Abstract: Although existing deep learning compressed-sensing-based Magnetic Resonance Imaging (CS-MRI) methods have achieved considerably impressive performance, explainability and generalizability continue to be challenging for such methods since the transition from mathematical analysis to network design not always natural enough, often most of them are not flexible enough to handle multi-sampling-ratio r… ▽ More

    Submitted 5 March, 2023; v1 submitted 14 May, 2022; originally announced May 2022.

    Comments: 14 pages, accepted to Biomedical Signal Processing and Control,March, 2023

    Journal ref: Biomedical Signal Processing and Control, vol 84, 2023

  41. arXiv:2112.01660  [pdf, other

    cs.CL cs.AI

    The Influence of Data Pre-processing and Post-processing on Long Document Summarization

    Authors: Xinwei Du, Kailun Dong, Yuchen Zhang, Yongsheng Li, Ruei-Yu Tsay

    Abstract: Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-process… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  42. arXiv:2111.06580  [pdf, other

    cs.CL cs.AI cs.LG

    On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

    Authors: Moontae Lee, Sungjun Cho, Kun Dong, David Mimno, David Bindel

    Abstract: Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it be… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

  43. arXiv:2107.09912  [pdf, other

    cs.LG stat.ML

    Design of Experiments for Stochastic Contextual Linear Bandits

    Authors: Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill

    Abstract: In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Explorin… ▽ More

    Submitted 22 July, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: Initial submission

  44. Fastening the Initial Access in 5G NR Sidelink for 6G V2X Networks

    Authors: Marouan Mizmizi, Francesco Linsalata, Mattia Brambilla, Filippo Morandi, Kai Dong, Maurizio Magarini, Monica Nicoli, Majid Nasiri Khormuji, Peng Wang, Renaud Alexandre Pitaval, Umberto Spagnolini

    Abstract: The ever-increasing demand for intelligent, automated, and connected mobility solutions pushes for the development of an innovative sixth Generation (6G) of cellular networks. A radical transformation on the physical layer of vehicular communications is planned, with a paradigm shift towards beam-based millimeter Waves or sub-Terahertz communications, which require precise beam pointing for guaran… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Journal ref: Vehicular Communications, 2021, 100402, ISSN 2214-2096

  45. arXiv:2105.04271  [pdf, other

    cs.CL

    DocOIE: A Document-level Context-Aware Dataset for OpenIE

    Authors: Kuicai Dong, Yilin Zhao, Aixin Sun, Jung-Jae Kim, Xiaoli Li

    Abstract: Open Information Extraction (OpenIE) aims to extract structured relational tuples (subject, relation, object) from sentences and plays critical roles for many downstream NLP applications. Existing solutions perform extraction at sentence level, without referring to any additional contextual information. In reality, however, a sentence typically exists as part of a document rather than standalone;… ▽ More

    Submitted 10 May, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: To appear in Findings of ACL 2021

  46. The direct force correction based framework for general co-rotational analysis

    Authors: Ziyun Kan, Kaijun Dong, Biaosong Chen, Haijun Peng, Xueguan Song

    Abstract: The use of nonlinear projection matrix in co-rotational (CR) analysis was pioneered by Rankin and Nour-Omid in 1990s (Computers & Structures, 30 (1988) 257-267; Comput. Methods Appl. Mech. Engrg., 93 (1991) 353-384), and has almost became a standard manner for CR formulations deduction over the past thirty years. This matrix however relies heavily on a hysterical and sophisticated derivation of th… ▽ More

    Submitted 16 April, 2021; originally announced April 2021.

    Comments: 47 pages

  47. arXiv:2102.04168  [pdf, other

    cs.LG

    Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

    Authors: Kefan Dong, Jiaqi Yang, Tengyu Ma

    Abstract: This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent wi… ▽ More

    Submitted 2 August, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Updated Figure 1 and its caption

  48. arXiv:2008.09251  [pdf, ps, other

    cs.LG stat.ML

    Refined Analysis of FPL for Adversarial Markov Decision Processes

    Authors: Yuanhao Wang, Kefan Dong

    Abstract: We consider the adversarial Markov Decision Process (MDP) problem, where the rewards for the MDP can be adversarially chosen, and the transition function can be either known or unknown. In both settings, Follow-the-PerturbedLeader (FPL) based algorithms have been proposed in previous literature. However, the established regret bounds for FPL based algorithms are worse than algorithms based on mirr… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 11 pages

  49. arXiv:2007.04876  [pdf, ps, other

    cs.LG stat.ML

    Multinomial Logit Bandit with Low Switching Cost

    Authors: Kefan Dong, Yingkai Li, Qin Zhang, Yuan Zhou

    Abstract: We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with $O(N \log T)$ assortment switches, almost matching… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Accepted for presentation at the International Conference on Machine Learning (ICML) 2020

  50. arXiv:2003.07489  [pdf, other

    cs.RO cs.LG

    Catch the Ball: Accurate High-Speed Motions for Mobile Manipulators via Inverse Dynamics Learning

    Authors: Ke Dong, Karime Pereida, Florian Shkurti, Angela P. Schoellig

    Abstract: Mobile manipulators consist of a mobile platform equipped with one or more robot arms and are of interest for a wide array of challenging tasks because of their extended workspace and dexterity. Typically, mobile manipulators are deployed in slow-motion collaborative robot scenarios. In this paper, we consider scenarios where accurate high-speed motions are required. We introduce a framework for t… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: Paper manuscript submitted to IROS 2020