Skip to main content

Showing 1–50 of 234 results for author: Gong, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14434  [pdf, other

    cs.CL

    Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

    Authors: Weihao Liu, Ning Wu, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

    Abstract: In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages

  2. arXiv:2406.14275  [pdf, other

    cs.CL cs.AI

    Step-Back Profiling: Distilling User History for Personalized Scientific Writing

    Authors: Xiangru Tang, Xingyao Zhang, Yanjun Shao, Jie Wu, Yilun Zhao, Arman Cohan, Ming Gong, Dongmei Zhang, Mark Gerstein

    Abstract: Large language models (LLMs) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce Step-Back Profiling to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. R… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  3. arXiv:2406.13327  [pdf, other

    cs.CV

    Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

    Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

    Abstract: While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, w… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  4. arXiv:2406.13272  [pdf, other

    cs.CV

    AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models

    Authors: Ken Chen, Sachith Seneviratne, Wei Wang, Dongting Hu, Sanjay Saha, Md. Tarek Hasan, Sanka Rasnayaka, Tamasha Malepathirana, Mingming Gong, Saman Halgamuge

    Abstract: Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression condit… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  5. arXiv:2406.09383  [pdf, other

    cs.CV

    Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

    Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024

  6. arXiv:2406.05855  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Distilled Disentangled Learning for Counterfactual Prediction

    Authors: Xinshu Li, Mingming Gong, Lina Yao

    Abstract: The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenari… ▽ More

    Submitted 14 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  7. arXiv:2406.05485  [pdf, other

    cs.CV

    Training-Free Robust Interactive Video Object Segmentation

    Authors: Xiaoli Wei, Zhaoqing Wang, Yandong Guo, Chunxia Zhang, Tongliang Liu, Mingming Gong

    Abstract: Interactive video object segmentation is a crucial video task, having various applications from video editing to data annotating. However, current approaches struggle to accurately segment objects across diverse domains. Recently, Segment Anything Model (SAM) introduces interactive visual prompts and demonstrates impressive performance across different domains. In this paper, we propose a training… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  8. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  9. arXiv:2406.02191  [pdf, other

    stat.ML cs.LG

    On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

    Authors: Shunxing Fan, Mingming Gong, Kun Zhang

    Abstract: We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such insta… ▽ More

    Submitted 11 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  10. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grou** of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  11. arXiv:2405.03711  [pdf, other

    cs.LG cs.AI cs.NE eess.SY

    Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

    Authors: Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

    Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

    Journal ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

  12. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  13. arXiv:2403.18038  [pdf

    cs.CV

    TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

    Authors: Li** Yang, Joshua Driscol, Ming Gong, Shujie Wang, Catherine G. Potts

    Abstract: Line detection is a classic and essential problem in image processing, computer vision and machine intelligence. Line detection has many important applications, including image vectorization (e.g., document recognition and art design), indoor map**, and important societal challenges (e.g., sea ice fracture line extraction from satellite imagery). Many line detection algorithms and methods have b… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Our TGGLinesPlus Python implementation is open source. 27 pages, 8 figures and 4 tables

  14. arXiv:2403.16502  [pdf, other

    cs.CV

    Medical Image Registration and Its Application in Retinal Images: A Review

    Authors: Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, Jiang Liu

    Abstract: Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To thi… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  15. arXiv:2403.15711  [pdf, other

    cs.LG stat.ME stat.ML

    Identifiable Latent Neural Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  16. arXiv:2403.01698  [pdf, other

    cs.CL cs.AI

    Hypertext Entity Extraction in Webpage

    Authors: Yifei Yang, Tianqiao Liu, Bo Shao, Hai Zhao, Linjun Shou, Ming Gong, Daxin Jiang

    Abstract: Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiven… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  17. arXiv:2403.00782  [pdf, other

    q-fin.ST cs.AI cs.CL

    Ploutos: Towards interpretable stock movement prediction with financial large language model

    Authors: Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang

    Abstract: Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional… ▽ More

    Submitted 18 February, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  18. arXiv:2402.19014  [pdf, other

    cs.CV

    Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

    Authors: Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

    Abstract: Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifically concerned with text-rich scenarios containing abundant document elements. Nevertheless, the importance of fine-grained features remains largely… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  19. arXiv:2402.18695  [pdf, other

    cs.CV cs.CL

    Grounding Language Models for Visual Entity Recognition

    Authors: Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez

    Abstract: We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our model extends an autoregressive Multi-modal Large Language Model by employing retrieval augmented constrained generation. It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning. Our method learns to distinguish similar entities within a vast label spa… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  20. Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network

    Authors: Zhaoyang Wang, Dongyang Li, Mingyang Zhang, Hao Luo, Maoguo Gong

    Abstract: Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their exceptional performance in modeling complex relations and learning high and low-level visual features. The direct application of diffusion models to HSI SR is… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI2024

    Report number: Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5794-5804

  21. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  22. arXiv:2402.13510  [pdf, other

    cs.CV

    SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields

    Authors: Zhentao Huang, Yukun Shi, Neil Bruce, Minglun Gong

    Abstract: The widespread adoption of implicit neural representations, especially Neural Radiance Fields (NeRF), highlights a growing need for editing capabilities in implicit 3D models, essential for tasks like scene post-processing and 3D content creation. Despite previous efforts in NeRF editing, challenges remain due to limitations in editing flexibility and quality. The key issue is develo** a neural… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 68T45

  23. arXiv:2402.08960  [pdf, other

    cs.CV cs.AI

    Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

    Authors: Zhaoqing Wang, Xiaobo Xia, Ziye Chen, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu

    Abstract: Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervis… ▽ More

    Submitted 11 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 27 pages, 18 figures, 10 tables

  24. arXiv:2402.06223  [pdf, other

    cs.LG cs.CV stat.ML

    Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Biwei Huang, Mingming Gong, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Multimodal contrastive representation learning methods have proven successful across a range of domains, partly due to their ability to generate meaningful shared representations of complex phenomena. To enhance the depth of analysis and understanding of these acquired representations, we introduce a unified causal model specifically designed for multimodal data. By examining this model, we show t… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  25. arXiv:2402.05394  [pdf, other

    cs.CV

    Enhancing Zero-shot Counting via Language-guided Exemplar Learning

    Authors: Mingjie Wang, Jun Zhou, Yong Dai, Eric Buys, Minglun Gong

    Abstract: Recently, Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency compared to Category-Specific Counting (CSC). This paper proposes a novel ExpressCount to enhance zero-shot object counting by delving deeply into language-guided exemplar learning. Specifically, the ExpressCount is comprised of an innovative Language-oriente… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  26. arXiv:2402.03941  [pdf, other

    cs.LG cs.AI stat.ME

    Discovery of the Hidden World with Large Language Models

    Authors: Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, Kun Zhang

    Abstract: Science originates with discovering new causal knowledge from a combination of known facts and observations. Traditional causal discovery approaches mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. However, the causal variables are usually unavailable in a wide range of real-world applications. The rise of large language models (LLMs) that a… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Preliminary version of an ongoing project; Chenxi and Yongqiang contributed equally; 26 pages, 41 figures; Project page: https://causalcoat.github.io/

  27. arXiv:2401.10632  [pdf, other

    cs.LG

    Interventional Fairness on Partially Known Causal Graphs: A Constrained Optimization Approach

    Authors: Aoqi Zuo, Yiqing Li, Susan Wei, Mingming Gong

    Abstract: Fair machine learning aims to prevent discrimination against individuals or sub-populations based on sensitive attributes such as gender and race. In recent years, causal inference methods have been increasingly used in fair machine learning to measure unfairness by causal effects. However, current methods assume that the true causal graph is given, which is often not true in real-world applicatio… ▽ More

    Submitted 8 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR24

  28. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  29. arXiv:2401.02566  [pdf

    cs.SD cs.LG cs.MM eess.AS

    Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

    Authors: Xiaoquan Li, Stephan Weiss, Yijun Yan, Yinhe Li, **chang Ren, John Soraghan, Ming Gong

    Abstract: Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResN… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: X.Li, S.Weiss, Y.Yan, Y.Li, J.Ren, J.Soraghan, M.Gong,"Siamese residual neural network for musical shape evaluation in piano performance assessment" in Proc. of the 31st European Signal Processing Conference, Helsinki, Finland

  30. arXiv:2401.01510  [pdf, other

    cs.CV

    Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

    Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond

    Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex dat… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  31. arXiv:2312.12227  [pdf, other

    cs.CV cs.AI

    HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

    Authors: Gaoge Han, Shaoli Huang, Mingming Gong, **glei Tang

    Abstract: We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhan… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 Main Track

  32. arXiv:2312.11112  [pdf, other

    cs.CV

    ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

    Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia, Dacheng Tao

    Abstract: Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e.g., spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires hi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code: https://github.com/LHDuan/ConDaFormer

  33. arXiv:2312.09498  [pdf, other

    cs.LG cs.AI

    Neural Gaussian Similarity Modeling for Differential Graph Structure Learning

    Authors: Xiaolong Fan, Maoguo Gong, Yue Wu, Zedong Tang, Jieyi Liu

    Abstract: Graph Structure Learning (GSL) has demonstrated considerable potential in the analysis of graph-unknown non-Euclidean data across a wide range of domains. However, constructing an end-to-end graph structure learning model poses a challenge due to the impediment of gradient flow caused by the nearest neighbor sampling strategy. In this paper, we construct a differential graph structure learning mod… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  34. arXiv:2312.06117  [pdf, other

    cs.CV

    M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

    Authors: Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wen** Ma, Can Qin

    Abstract: 3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple r… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 12 pages, 10 figures, 10 tables, AAAI 2024

    Journal ref: AAAI 2024

  35. arXiv:2312.06063  [pdf, other

    cs.CV cs.AI

    PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration

    Authors: Yue Wu, Yongzhe Yuan, Xiaolong Fan, Xiaoshui Huang, Maoguo Gong, Qiguang Miao

    Abstract: We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the outp… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  36. arXiv:2312.04333  [pdf, other

    cs.CL

    Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

    Authors: Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

    Abstract: This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing diffe… ▽ More

    Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 15 pages

  37. arXiv:2311.09233  [pdf, other

    cs.LG cs.GR cs.RO

    Neural Packing: from Visual Sensing to Reinforcement Learning

    Authors: Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu

    Abstract: We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforce… ▽ More

    Submitted 16 October, 2023; originally announced November 2023.

  38. arXiv:2311.03253  [pdf, other

    cs.CL cs.AI

    Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency

    Authors: Zilin Xiao, Linjun Shou, Xingyao Zhang, Jie Wu, Ming Gong, Jian Pei, Daxin Jiang

    Abstract: Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED,… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Findings

  39. arXiv:2311.03250  [pdf, other

    cs.CL cs.AI

    Instructed Language Models with Retrievers Are Powerful Entity Linkers

    Authors: Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou, Jian Pei, Daxin Jiang

    Abstract: Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generati… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Main

  40. arXiv:2310.20246  [pdf, other

    cs.CL cs.AI

    Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

    Authors: Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li

    Abstract: Existing research predominantly focuses on develo** powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multil… ▽ More

    Submitted 28 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Work in Progress

  41. arXiv:2310.19491  [pdf, ps, other

    math.ST cs.LG stat.ML

    Generator Identification for Linear SDEs with Additive and Multiplicative Noise

    Authors: Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong

    Abstract: In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifica… ▽ More

    Submitted 21 January, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  42. arXiv:2310.15580  [pdf, other

    cs.LG

    Identifiable Latent Polynomial Causal Models Through the Lens of Change

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \cit… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  43. arXiv:2310.11239  [pdf, other

    cs.CV cs.RO

    LiDAR-based 4D Occupancy Completion and Forecasting

    Authors: Xinhao Liu, Moonjun Gong, Qi Fang, Haoyu Xie, Yiming Li, Hang Zhao, Chen Feng

    Abstract: Scene completion and forecasting are two popular perception problems in research for mobile agents like autonomous vehicles. Existing approaches treat the two problems in isolation, resulting in a separate perception of the two aspects. In this paper, we introduce a novel LiDAR perception task of Occupancy Completion and Forecasting (OCF) in the context of autonomous driving to unify these aspects… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  44. RUEL: Retrieval-Augmented User Representation with Edge Browser Logs for Sequential Recommendation

    Authors: Ning Wu, Ming Gong, Linjun Shou, Jian Pei, Daxin Jiang

    Abstract: Online recommender systems (RS) aim to match user needs with the vast amount of resources available on various platforms. A key challenge is to model user preferences accurately under the condition of data sparsity. To address this challenge, some methods have leveraged external user behavior data from multiple platforms to enrich user representation. However, all of these methods require a consis… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: CIKM 2023 ADS

  45. arXiv:2309.10279  [pdf, other

    cs.CV cs.GR

    360$^\circ$ Reconstruction From a Single Image Using Space Carved Outpainting

    Authors: Nuri Ryu, Minsu Gong, Geonung Kim, Joo-Haeng Lee, Sunghyun Cho

    Abstract: We introduce POP3D, a novel framework that creates a full $360^\circ$-view 3D model from a single image. POP3D resolves two prominent issues that limit the single-view reconstruction. Firstly, POP3D offers substantial generalizability to arbitrary categories, a trait that previous methods struggle to achieve. Secondly, POP3D further improves reconstruction fidelity and naturalness, a crucial aspec… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to SIGGRAPH Asia 2023 (Conference Track). For the project page, see http://cg.postech.ac.kr/research/POP3D For the supplementary document, see http://cg.postech.ac.kr/papers/2023_SIGAsia_Ryu_Supp.pdf

  46. arXiv:2309.07407  [pdf, other

    cs.DC

    Deep Reinforcement Learning-based Scheduling for Optimizing System Load and Response Time in Edge and Fog Computing Environments

    Authors: Zhiyu Wang, Mohammad Goudarzi, Mingming Gong, Rajkumar Buyya

    Abstract: Edge/fog computing, as a distributed computing paradigm, satisfies the low-latency requirements of ever-increasing number of IoT applications and has become the mainstream computing paradigm behind IoT applications. However, because large number of IoT applications require execution on the edge/fog resources, the servers may be overloaded. Hence, it may disrupt the edge/fog servers and also negati… ▽ More

    Submitted 22 October, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

  47. arXiv:2308.04696  [pdf, other

    cs.AI cs.LG

    Explainable AI in Orthopedics: Challenges, Opportunities, and Prospects

    Authors: Soheyla Amirian, Luke A. Carlson, Matthew F. Gong, Ines Lohse, Kurt R. Weiss, Johannes F. Plate, Ahmad P. Tafti

    Abstract: While artificial intelligence (AI) has made many successful applications in various domains, its adoption in healthcare lags a little bit behind other high-stakes settings. Several factors contribute to this slower uptake, including regulatory frameworks, patient privacy concerns, and data heterogeneity. However, one significant challenge that impedes the implementation of AI in healthcare, partic… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: This paper was accepted at The 2023 World Congress in Computer Science, Computer Engineering, and Applied Computing (CSCE'23)

  48. arXiv:2308.04356  [pdf, other

    cs.CV cs.AI

    Learning Unbiased Image Segmentation: A Case Study with Plain Knee Radiographs

    Authors: Nickolas Littlefield, Johannes F. Plate, Kurt R. Weiss, Ines Lohse, Avani Chhabra, Ismaeel A. Siddiqui, Zoe Menezes, George Mastorakos, Sakshi Mehul Thakar, Mehrnaz Abedian, Matthew F. Gong, Luke A. Carlson, Hamidreza Moradi, Soheyla Amirian, Ahmad P. Tafti

    Abstract: Automatic segmentation of knee bony anatomy is essential in orthopedics, and it has been around for several years in both pre-operative and post-operative settings. While deep learning algorithms have demonstrated exceptional performance in medical image analysis, the assessment of fairness and potential biases within these models remains limited. This study aims to revisit deep learning-powered k… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: This paper has been accepted by IEEE BHI 2023

  49. A Cyber-Physical Routing Protocol Exploiting Trajectory Dynamics for Mission-Oriented Flying Ad Hoc Networks

    Authors: Die Hu, Shaoshi Yang, Min Gong, Zhiyong Feng, Xuejun Zhu

    Abstract: As a special type of mobile ad hoc network (MANET), the flying ad hoc network (FANET) has the potential to enable a variety of emerging applications in both civilian wireless communications (e.g., 5G and 6G) and the defense industry. The routing protocol plays a pivotal role in FANET. However, when designing the routing protocol for FANET, it is conventionally assumed that the aerial nodes move ra… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 12 pages, 24 figures, accepted to appear on Engineering in Dec. 2022 (ISSN 2095-8099)

    Journal ref: Engineering, Volume 19, Pages 217-227, Dec. 2022

  50. arXiv:2307.16405  [pdf, other

    cs.LG stat.ME stat.ML

    Causal-learn: Causal Discovery in Python

    Authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang

    Abstract: Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, m… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Journal ref: Journal of Machine Learning Research 25 (2024)