Skip to main content

Showing 1–50 of 97 results for author: Cao, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11200  [pdf, other

    cs.LG cs.CL

    AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval

    Authors: Shirley Wu, Shiyu Zhao, Qian Huang, Kexin Huang, Michihiro Yasunaga, Kaidi Cao, Vassilis N. Ioannidis, Karthik Subbian, Jure Leskovec, James Zou

    Abstract: Large language model (LLM) agents have demonstrated impressive capability in utilizing external tools and knowledge to boost accuracy and reduce hallucinations. However, develo** the prompting techniques that make LLM agents able to effectively use external tools and knowledge is a heuristic and laborious task. Here, we introduce AvaTaR, a novel and automatic framework that optimizes an LLM agen… ▽ More

    Submitted 17 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: 19 pages, 8 figures, 6 tables

  2. arXiv:2404.18434  [pdf, ps, other

    cs.IT

    The augmented codes of a family of linear codes with locality 2

    Authors: Ziling Heng, Keqing Cao

    Abstract: In this paper, we first generalize the class of linear codes by Ding and Ding (IEEE TIT, 61(11), pp. 5835-5842, 2015). Then we mainly study the augmented codes of this generalized class of linear codes. For one thing, we use Gaussian sums to determine the parameters and weight distributions of the augmented codes in some cases. It is shown that the augmented codes are self-orthogonal and have only… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 25 pages

  3. arXiv:2404.15275  [pdf, other

    cs.CV

    ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

    Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang

    Abstract: Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textb… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Project Page: https://id-animator.github.io/

  4. arXiv:2404.13207  [pdf, other

    cs.IR cs.LG

    STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

    Authors: Shirley Wu, Shiyu Zhao, Michihiro Yasunaga, Kexin Huang, Kaidi Cao, Qian Huang, Vassilis N. Ioannidis, Karthik Subbian, James Zou, Jure Leskovec

    Abstract: Answering real-world complex queries, such as complex product search, often requires accurate retrieval from semi-structured knowledge bases that involve blend of unstructured (e.g., textual descriptions of products) and structured (e.g., entity relations of products) information. However, previous works have mostly studied textual and relational retrieval tasks as separate topics. To address the… ▽ More

    Submitted 20 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 26 pages, 6 figures

  5. arXiv:2404.00776  [pdf, other

    cs.LG cs.DB stat.ML

    PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning

    Authors: Weihua Hu, Yiwen Yuan, Zecheng Zhang, Akihiro Nitta, Kaidi Cao, Vid Kocijan, Jure Leskovec, Matthias Fey

    Abstract: We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g.,… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: https://github.com/pyg-team/pytorch-frame

  6. arXiv:2403.14173  [pdf, other

    cs.RO

    HCTO: Optimality-Aware LiDAR Inertial Odometry with Hybrid Continuous Time Optimization for Compact Wearable Map** System

    Authors: Jian** Li, Shenghai Yuan, Muqing Cao, Thien-Minh Nguyen, Kun Cao, Lihua Xie

    Abstract: Compact wearable map** system (WMS) has gained significant attention due to their convenience in various applications. Specifically, it provides an efficient way to collect prior maps for 3D structure inspection and robot-based "last-mile delivery" in complex environments. However, vibrations in human motion and the uneven distribution of point cloud features in complex environments often lead t… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  7. arXiv:2403.06265  [pdf, other

    cs.CL cs.AI cs.LG

    Unpacking Tokenization: Evaluating Text Compression and its Correlation with Model Performance

    Authors: Omer Goldman, Avi Caciularu, Matan Eyal, Kris Cao, Idan Szpektor, Reut Tsarfaty

    Abstract: Despite it being the cornerstone of BPE, the most common tokenization algorithm, the importance of compression in the tokenization process is still unclear. In this paper, we argue for the theoretical importance of compression, that can be viewed as 0-gram language modeling where equal probability is assigned to all tokens. We also demonstrate the empirical importance of compression for downstream… ▽ More

    Submitted 22 June, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: EMNLP 2024, Findings

  8. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  9. arXiv:2402.16009  [pdf, other

    cs.DL cs.CL

    PST-Bench: Tracing and Benchmarking the Source of Publications

    Authors: Fan** Zhang, Kun Cao, Yukuo Cen, Jifan Yu, Da Yin, Jie Tang

    Abstract: Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which au… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 8 pages, 3 appendix pages

  10. arXiv:2402.15810  [pdf, other

    cs.DL cs.CL cs.LG

    OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

    Authors: Fan** Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang

    Abstract: With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.… ▽ More

    Submitted 20 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: KDD'24, 9 pages, 5 appendix pages

    Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

  11. arXiv:2402.12192  [pdf, other

    cs.CV

    Pan-Mamba: Effective pan-sharpening with State Space Model

    Authors: Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening mot… ▽ More

    Submitted 8 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  12. arXiv:2401.16923  [pdf, other

    cs.CV cs.RO eess.IV

    Fourier Prompt Tuning for Modality-Incomplete Scene Segmentation

    Authors: Rui** Liu, Jiaming Zhang, Kunyu Peng, Yufan Chen, Ke Cao, Junwei Zheng, M. Saquib Sarfraz, Kailun Yang, Rainer Stiefelhagen

    Abstract: Integrating information from multiple modalities enhances the robustness of scene perception systems in autonomous vehicles, providing a more comprehensive and reliable sensory framework. However, the modality incompleteness in multi-modal segmentation remains under-explored. In this work, we establish a task called Modality-Incomplete Scene Segmentation (MISS), which encompasses both system-level… ▽ More

    Submitted 10 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE IV 2024. The source code is publicly available at https://github.com/Rui**L/MISS

  13. arXiv:2401.10685  [pdf, other

    cs.LG cs.AI eess.SP

    Towards End-to-End GPS Localization with Neural Pseudorange Correction

    Authors: Xu Weng, KV Ling, Haochen Liu, Kun Cao

    Abstract: Pseudorange errors are the root cause of localization inaccuracy in GPS. Previous data-driven methods regress and eliminate pseudorange errors using handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS localization framework, E2E-PrNet, to train a neural network for pseudorange correction (PrNet) directly using the final task loss calculated with the ground truth of GPS recei… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  14. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  15. arXiv:2312.04693  [pdf, other

    cs.LG

    GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts

    Authors: Shirley Wu, Kaidi Cao, Bruno Ribeiro, James Zou, Jure Leskovec

    Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to complex non-synthetic distributional shifts naturally occurring in the real world. Here we develop GraphMETRO, a Graph Neural Network architecture, that reliably models natural diversity and cap… ▽ More

    Submitted 5 February, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: Graph Neural Networks, Mixture-of-experts, Distribution Shifts, Generalization

  16. arXiv:2309.14097  [pdf, other

    cs.OH

    How do users design scientific workflows? The Case of Snakemake

    Authors: Sebastian Pohl, Nourhan Elfaramawy, Kedi Cao, Birte Kehr, Matthias Weidlich

    Abstract: Scientific workflows automate the analysis of large-scale scientific data, fostering the reuse of data processing operators as well as the reproducibility and traceability of analysis results. In exploratory research, however, workflows are continuously adapted, utilizing a wide range of tools and software libraries, to test scientific hypotheses. Script-based workflow engines cater to the require… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  17. arXiv:2309.13035  [pdf, other

    cs.RO

    PyPose v0.6: The Imperative Programming Interface for Robotics

    Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, **gnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

    Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  18. arXiv:2308.13490  [pdf, other

    cs.LG cs.AR cs.SI

    TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

    Authors: Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Kaidi Cao, Bahare Fatemi, Mike Burrows, Charith Mendis, Bryan Perozzi

    Abstract: Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the autotuner for XLA, a machine learning compiler, discovered 10-20% speedup on state-of-the-art models serving substantial production traffic at Google. Although there ex… ▽ More

    Submitted 5 December, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

  19. arXiv:2308.03209  [pdf, other

    cs.LG

    Communication-Free Distributed GNN Training with Vertex Cut

    Authors: Kaidi Cao, Rui Deng, Shirley Wu, Edward W Huang, Karthik Subbian, Jure Leskovec

    Abstract: Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are the… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  20. arXiv:2307.07763  [pdf, other

    cs.RO cs.CV eess.IV

    Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents

    Authors: Ke Cao, Rui** Liu, Ze Wang, Kunyu Peng, Jiaming Zhang, Junwei Zheng, Zhifeng Teng, Kailun Yang, Rainer Stiefelhagen

    Abstract: The mobile robot relies on SLAM (Simultaneous Localization and Map**) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based… ▽ More

    Submitted 25 December, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: Accepted to ROBIO 2023

  21. arXiv:2307.07757  [pdf, other

    cs.CV cs.HC cs.RO eess.IV

    Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Hel** People with Visual Impairments

    Authors: Rui** Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

    Abstract: Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: Code will be available at https://github.com/Rui**L/OpenSU

  22. arXiv:2305.12322  [pdf, other

    cs.LG cs.SI

    Learning Large Graph Property Prediction via Graph Segment Training

    Authors: Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi

    Abstract: Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first di… ▽ More

    Submitted 5 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

  23. arXiv:2305.05461  [pdf, other

    cs.CL

    What is the best recipe for character-level encoder-only modelling?

    Authors: Kris Cao

    Abstract: This paper aims to benchmark recent progress in language understanding models that output contextualised representations at the character level. Many such modelling architectures and methods to train those architectures have been proposed, but it is currently unclear what the relative contributions of the architecture vs. the pretraining objective are to final model performance. We explore the des… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: accepted at ACL 2023

  24. arXiv:2305.00271  [pdf, other

    cs.RO

    Path Planning for Multiple Tethered Robots Using Topological Braids

    Authors: Muqing Cao, Kun Cao, Shenghai Yuan, Kangcheng Liu, Yan Loi Wong, Lihua Xie

    Abstract: Path planning for multiple tethered robots is a challenging problem due to the complex interactions among the cables and the possibility of severe entanglements. Previous works on this problem either consider idealistic cable models or provide no guarantee for entanglement-free paths. In this work, we present a new approach to address this problem using the theory of braids. By establishing a topo… ▽ More

    Submitted 15 June, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

    Comments: Accepted for presentation in Robotics: Science and Systems 2023

  25. arXiv:2304.07511  [pdf, other

    cs.HC

    Pilgrimage to Pureland: Art, Perception and the Wutai Mural VR Reconstruction

    Authors: Rongxuan Mu, Yuhe Nie, Kent Cao, Ruoxin You, Yinzong Wei, Xin Tong

    Abstract: Virtual reality (VR) supports audiences to engage with cultural heritage proactively. We designed an easy-to-access and guided Pilgrimage To Pureland VR reconstruction of Dunhuang Mogao Grottoes to offer the general public an accessible and engaging way to explore the Dunhuang murals. We put forward an immersive VR reconstruction paradigm that can efficiently convert complex 2D artwork into a VR e… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

  26. arXiv:2304.03854  [pdf, other

    cs.LG

    Revisiting Deep Learning for Variable Type Recovery

    Authors: Kevin Cao, Kevin Leach

    Abstract: Compiled binary executables are often the only available artifact in reverse engineering, malware analysis, and software systems maintenance. Unfortunately, the lack of semantic information like variable types makes comprehending binaries difficult. In efforts to improve the comprehensibility of binaries, researchers have recently used machine learning techniques to predict semantic information co… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: In The 31st International Conference on Program Comprehension(ICPC 2023 RENE)

  27. arXiv:2303.11910  [pdf, other

    cs.CV

    360BEV: Panoramic Semantic Map** for Indoor Bird's-Eye View

    Authors: Zhifeng Teng, Jiaming Zhang, Kailun Yang, Kunyu Peng, Hao Shi, Simon Reiß, Ke Cao, Rainer Stiefelhagen

    Abstract: Seeing only a tiny part of the whole is not knowing the full circumstance. Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from egocentric views, is restricted when using a narrow Field of View (FoV) alone. In this work, map** from 360° panoramas to BEV semantics, the 360BEV task, is established for the first time to achieve holistic representations of indoor scenes in… ▽ More

    Submitted 4 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: Code and datasets are available at the project page: https://jamycheung.github.io/360BEV.html. Accepted to WACV 2024

  28. arXiv:2303.07669  [pdf, other

    cs.LG

    AutoTransfer: AutoML with Knowledge Transfer -- An Application to Graph Neural Networks

    Authors: Kaidi Cao, Jiaxuan You, Jiaju Liu, Jure Leskovec

    Abstract: AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which requires exploring many architectures, leading to high computational cost. Here we propose AutoTransfer, an AutoML solution that i… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: ICLR 2023

  29. arXiv:2303.07666  [pdf, other

    cs.LG

    Relational Multi-Task Learning: Modeling Relations between Data and Tasks

    Authors: Kaidi Cao, Jiaxuan You, Jure Leskovec

    Abstract: A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task l… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: ICLR 2022 Spotlight

  30. arXiv:2303.05075  [pdf, other

    cs.RO

    DoubleBee: A Hybrid Aerial-Ground Robot with Two Active Wheels

    Authors: Muqing Cao, Xinhang Xu, Shenghai Yuan, Kun Cao, Kangcheng Liu, Lihua Xie

    Abstract: We present the dynamic model and control of DoubleBee, a novel hybrid aerial-ground vehicle consisting of two propellers mounted on tilting servo motors and two motor-driven wheels. DoubleBee exploits the high energy efficiency of a bicopter configuration in aerial mode, and enjoys the low power consumption of a two-wheel self-balancing robot on the ground. Furthermore, the propeller thrusts act a… ▽ More

    Submitted 20 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  31. arXiv:2212.01536  [pdf, other

    cs.RO

    NEPTUNE: Nonentangling Trajectory Planning for Multiple Tethered Unmanned Vehicles

    Authors: Muqing Cao, Kun Cao, Shenghai Yuan, Thien-Minh Nguyen, Lihua Xie

    Abstract: Despite recent progress on trajectory planning of multiple robots and path planning of a single tethered robot, planning of multiple tethered robots to reach their individual targets without entanglements remains a challenging problem. In this paper, we present a complete approach to address this problem. Firstly, we propose a multi-robot tether-aware representation of homotopy, using which we can… ▽ More

    Submitted 25 April, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted for publication in IEEE Transaction on Robotics

  32. arXiv:2210.14843  [pdf, other

    stat.ML cs.AI cs.LG

    TuneUp: A Simple Improved Training Strategy for Graph Neural Networks

    Authors: Weihua Hu, Kaidi Cao, Kexin Huang, Edward W Huang, Karthik Subbian, Kenji Kawaguchi, Jure Leskovec

    Abstract: Despite recent advances in Graph Neural Networks (GNNs), their training strategies remain largely under-explored. The conventional training strategy learns over all nodes in the original graph(s) equally, which can be sub-optimal as certain nodes are often more difficult to learn than others. Here we present TuneUp, a simple curriculum-based training strategy for improving the predictive performan… ▽ More

    Submitted 26 August, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

  33. arXiv:2210.05102  [pdf, other

    cs.SE cs.LG

    Pre-Training Representations of Binary Code Using Contrastive Learning

    Authors: Yifan Zhang, Chen Huang, Yueke Zhang, Kevin Cao, Scott Thomas Andersen, Huajie Shao, Kevin Leach, Yu Huang

    Abstract: Compiled software is delivered as executable binary code. Developers write source code to express the software semantics, but the compiler converts it to a binary format that the CPU can directly execute. Therefore, binary code analysis is critical to applications in reverse engineering and computer security tasks where source code is not available. However, unlike source code and natural language… ▽ More

    Submitted 30 August, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

  34. arXiv:2209.15428  [pdf, other

    cs.RO

    PyPose: A Library for Robot Learning with Physics-based Optimization

    Authors: Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey, Aryan, Jiahe Xu, Tianhao Wu, Haonan He, Daning Huang, Zhongqiang Ren, Shibo Zhao, Taimeng Fu, Pranay Reddy, Xiao Lin, Wenshan Wang, **gnan Shi, Rajat Talak, Kun Cao, Yi Du , et al. (12 additional authors not shown)

    Abstract: Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two co… ▽ More

    Submitted 24 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Project Website: https://pypose.org Documentation: https://pypose.org/docs/ Tutorial: https://pypose.org/tutorials/ Source code: https://github.com/pypose/pypose

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

  35. Large-scale Multi-granular Concept Extraction Based on Machine Reading Comprehension

    Authors: Siyu Yuan, Deqing Yang, Jiaqing Liang, Jilun Sun, **gyue Huang, Kaiyan Cao, Yanghua Xiao, Rui Xie

    Abstract: The concepts in knowledge graphs (KGs) enable machines to understand natural language, and thus play an indispensable role in many applications. However, existing KGs have the poor coverage of concepts, especially fine-grained concepts. In order to supply existing KGs with more fine-grained and new concepts, we propose a novel concept extraction framework, namely MRC-CE, to extract large-scale mul… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Journal ref: ISWC2021

  36. arXiv:2206.07680  [pdf, other

    cs.LG physics.geo-ph

    Learning Large-scale Subsurface Simulations with a Hybrid Graph Network Simulator

    Authors: Tailin Wu, Qinchen Wang, Yinan Zhang, Rex Ying, Kaidi Cao, Rok Sosič, Ridwan Jalali, Hassan Hamam, Marko Maucec, Jure Leskovec

    Abstract: Subsurface simulations use computational models to predict the flow of fluids (e.g., oil, water, gas) through porous media. These simulations are pivotal in industrial applications such as petroleum production, where fast and accurate models are needed for high-stake decision making, for example, for well placement optimization and field development planning. Classical finite difference numerical… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: SIGKDD 2022; 11 pages, 6 figures

  37. arXiv:2206.03040  [pdf, other

    stat.ML cs.IR cs.LG

    Learning Backward Compatible Embeddings

    Authors: Weihua Hu, Rajas Bansal, Kaidi Cao, Nikhil Rao, Karthik Subbian, Jure Leskovec

    Abstract: Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product recommendation). The produced embeddings are then widely consumed by consumer teams to solve their unintended tasks (e.g., fraud detection). However… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

    Comments: KDD 2022, Applied Data Science Track

  38. arXiv:2205.03809  [pdf, other

    cs.CV

    Fingerprint Template Invertibility: Minutiae vs. Deep Templates

    Authors: Kanishka P. Wijewardena, Steven A. Grosz, Kai Cao, Anil K. Jain

    Abstract: Much of the success of fingerprint recognition is attributed to minutiae-based fingerprint representation. It was believed that minutiae templates could not be inverted to obtain a high fidelity fingerprint image, but this assumption has been shown to be false. The success of deep learning has resulted in alternative fingerprint representations (embeddings), in the hope that they might offer bette… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

  39. arXiv:2202.01709  [pdf, other

    cs.CL cs.LG

    Towards Coherent and Consistent Use of Entities in Narrative Generation

    Authors: Pinelopi Papalampidi, Kris Cao, Tomas Kocisky

    Abstract: Large pre-trained language models (LMs) have demonstrated impressive capabilities in generating long, fluent text; however, there is little to no analysis on their ability to maintain entity coherence and consistency. In this work, we focus on the end task of narrative generation and systematically analyse the long-range entity coherence and consistency in generated stories. First, we propose a se… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  40. arXiv:2110.08329  [pdf, other

    cs.CL cs.AI cs.LG

    Control Prefixes for Parameter-Efficient Text Generation

    Authors: Jordan Clive, Kris Cao, Marek Rei

    Abstract: Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to a downstream application. However, it uses the same dataset-level tuned prompt for all examples in the dataset. We extend this idea and propose a dynamic method, Control Prefixes, which allows for the inclusion of conditional input-dependent information, combining the benefits of prompt tuning and… ▽ More

    Submitted 10 May, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

  41. arXiv:2109.04686  [pdf, other

    math.OC cs.RO

    DIRECT: A Differential Dynamic Programming Based Framework for Trajectory Generation

    Authors: Kun Cao, Muqing Cao, Shenghai Yuan, Lihua Xie

    Abstract: This paper introduces a differential dynamic programming (DDP) based framework for polynomial trajectory generation for differentially flat systems. In particular, instead of using a linear equation with increasing size to represent multiple polynomial segments as in literature, we take a new perspective from state-space representation such that the linear equation reduces to a finite horizon cont… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

    Comments: 8 pages, 5 figures

  42. arXiv:2109.02550  [pdf, other

    cs.CL

    You should evaluate your language model on marginal likelihood over tokenisations

    Authors: Kris Cao, Laura Rimell

    Abstract: Neural language models typically tokenise input text into sub-word units to achieve an open vocabulary. The standard approach is to use a single canonical tokenisation at both train and test time. We suggest that this approach is unsatisfactory and may bottleneck our evaluation of language model performance. Using only the one-best tokenisation ignores tokeniser uncertainty over alternative tokeni… ▽ More

    Submitted 21 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: accepted at EMNLP 2021

  43. arXiv:2104.11364  [pdf

    q-bio.OT cs.CY

    A field guide to cultivating computational biology

    Authors: Anne E Carpenter, Casey S Greene, Piero Carnici, Benilton S Carvalho, Michiel de Hoon, Stacey Finley, Kim-Anh Le Cao, Jerry SH Lee, Luigi Marchionni, Suzanne Sindi, Fabian J Theis, Gregory P Way, Jean YH Yang, Elana J Fertig

    Abstract: Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients. This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model. This interdisciplina… ▽ More

    Submitted 22 April, 2021; originally announced April 2021.

  44. arXiv:2104.09587  [pdf, other

    cs.CV

    ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion

    Authors: Yaqi Xia, Yan Xia, Wei Li, Rui Song, Kailang Cao, Uwe Stilla

    Abstract: We tackle the problem of object completion from point clouds and propose a novel point cloud completion network employing an Asymmetrical Siamese Feature Matching strategy, termed as ASFM-Net. Specifically, the Siamese auto-encoder neural network is adopted to map the partial and complete input point cloud into a shared latent space, which can capture detailed shape prior. Then we design an iterat… ▽ More

    Submitted 4 August, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted by ACM MM2021. This work achieves the 1st place in the leaderboard of Completion3D

  45. arXiv:2104.04450  [pdf, other

    cs.LG cs.CV

    Unsupervised Class-Incremental Learning Through Confusion

    Authors: Shivam Khare, Kun Cao, James Rehg

    Abstract: While many works on Continual Learning have shown promising results for mitigating catastrophic forgetting, they have relied on supervised training. To successfully learn in a label-agnostic incremental setting, a model must distinguish between learned and novel classes to properly include samples for training. We introduce a novelty detection method that leverages network confusion caused by trai… ▽ More

    Submitted 8 December, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

  46. arXiv:2102.03526  [pdf, other

    cs.LG cs.CV

    Open-World Semi-Supervised Learning

    Authors: Kaidi Cao, Maria Brbic, Jure Leskovec

    Abstract: A fundamental limitation of applying semi-supervised learning in real-world settings is the assumption that unlabeled test data contains only classes previously encountered in the labeled training data. However, this assumption rarely holds for data in-the-wild, where instances belonging to novel classes may appear at testing time. Here, we introduce a novel open-world semi-supervised learning set… ▽ More

    Submitted 25 January, 2022; v1 submitted 6 February, 2021; originally announced February 2021.

  47. arXiv:2102.01951  [pdf, other

    cs.CL cs.AI

    Mind the Gap: Assessing Temporal Generalization in Neural Language Models

    Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

    Abstract: Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlap** time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language mode… ▽ More

    Submitted 26 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: To appear as a Spotlight at NeurIPS 2021

  48. arXiv:2102.00826  [pdf, other

    cs.SE cs.AI cs.IR

    Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

    Authors: Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, Xiang Chen

    Abstract: As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the po… ▽ More

    Submitted 10 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 13 pages, 6 figures, accepted in ICSE'21: 43rd IEEE/ACM International Conference on Software Engineering

    ACM Class: D.2.2; I.2.7

  49. arXiv:2012.04701  [pdf, other

    eess.IV cs.CV

    3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management

    Authors: Tianyi Zhao, Kai Cao, Jiawen Yao, Isabella Nogues, Le Lu, Lingyun Huang, **g Xiao, Zhaozheng Yin, Ling Zhang

    Abstract: The pancreatic disease taxonomy includes ten types of masses (tumors or cysts)[20,8]. Previous work focuses on develo** segmentation or classification methods only for certain mass types. Differential diagnosis of all mass types is clinically highly desirable [20] but has not been investigated using an automated image understanding approach. We exploit the feasibility to distinguish pancreatic d… ▽ More

    Submitted 8 December, 2020; originally announced December 2020.

  50. arXiv:2011.09192  [pdf, other

    cs.AI cs.GT cs.MA

    Game Plan: What AI can do for Football, and What Football can do for AI

    Authors: Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder , et al. (11 additional authors not shown)

    Abstract: The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis. More recently, AI techniques have been applied to football, due to a huge increase in data collection by professional teams, increased computational power, and advances in machine learning, with t… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.