Skip to main content

Showing 1–50 of 332 results for author: Wei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01937  [pdf, other

    cs.CL

    Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

    Authors: Linzhuang Sun, Hao Liang, **gxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang

    Abstract: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computation… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  3. arXiv:2407.00005  [pdf, other

    cs.DC

    Dual-pronged deep learning preprocessing on heterogeneous platforms with CPU, GPU and CSD

    Authors: Jia Wei, Xingjun Zhang, Witold Pedrycz, Longxiang Wang, Jie Zhao

    Abstract: Most existing data preprocessing is done at the CPU. Although some studies use techniques such as multi-processing and double buffering to accelerate CPU preprocessing, CPU computational speed and storage bandwidth still limit the processing speed. Other studies try to use intelligent data storage devices, such as computational storage devices, to complete data preprocessing instead of CPUs. The c… ▽ More

    Submitted 17 April, 2024; originally announced July 2024.

  4. arXiv:2406.19966  [pdf, other

    cs.CL

    Simulating Financial Market via Large Language Model based Agents

    Authors: Shen Gao, Yuntao Wen, Minghang Zhu, Jianing Wei, Yuhan Cheng, Qunzi Zhang, Shuo Shang

    Abstract: Most economic theories typically assume that financial market participants are fully rational individuals and use mathematical models to simulate human behavior in financial markets. However, human behavior is often not entirely rational and is challenging to predict accurately with mathematical models. In this paper, we propose \textbf{A}gent-based \textbf{S}imulated \textbf{F}inancial \textbf{M}… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.13631  [pdf, other

    cs.HC cs.AI cs.SE

    On AI-Inspired UI-Design

    Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, GĂ©rard Dray, Walid Maalej

    Abstract: Graphical User Interface (or simply UI) is a primary mean of interaction between users and their device. In this paper, we discuss three major complementary approaches on how to use Artificial Intelligence (AI) to support app designers create better, more diverse, and creative UI of mobile apps. First, designers can prompt a Large Language Model (LLM) like GPT to directly generate and adjust one o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  6. arXiv:2406.12787  [pdf, other

    cs.CL cs.HC

    Generating Educational Materials with Different Levels of Readability using LLMs

    Authors: Chieh-Yang Huang, **g Wei, Ting-Hao 'Kenneth' Huang

    Abstract: This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting signific… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: In2Writing 2024

  7. arXiv:2406.10956  [pdf, other

    cs.SD cs.LG eess.AS

    Robust Channel Learning for Large-Scale Radio Speaker Verification

    Authors: Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

    Abstract: Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learnin… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages, 11 figures

  8. DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search

    Authors: Jiuqi Wei, Botao Peng, Xiaodong Lee, Themis Palpanas

    Abstract: Locality-sensitive hashing (LSH) is a well-known solution for approximate nearest neighbor (ANN) search in high-dimensional spaces due to its robust theoretical guarantee on query accuracy. Traditional LSH-based methods mainly focus on improving the efficiency and accuracy of the query phase by designing different query strategies, but pay little attention to improving the efficiency of the indexi… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Journal ref: PVLDB, 17(9): 2241 - 2254, 2024

  9. arXiv:2406.10857  [pdf, other

    cs.SE

    An LLM-enhanced Multi-objective Evolutionary Search for Autonomous Driving Test Scenario Generation

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, Yuan Zhou, Shuo Li, Jun Wei, Dan Ye, Wei Wang, Tianwei Zhang

    Abstract: The safety of Autonomous Driving Systems (ADSs) is significantly important for the implementation of autonomous vehicles (AVs). Therefore, ADSs must be evaluated thoroughly before their release and deployment to the public. How to generate diverse safety-critical test scenarios is a key task for ADS testing. This paper proposes LEADE, an LLM-enhanced scenario generation approach for ADS testing, w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages

  10. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  11. arXiv:2406.05688  [pdf, other

    cs.CL cs.AI cs.LG

    Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

    Authors: Cheng Tan, Dongxin Lyu, Siyuan Li, Zhangyang Gao, **gxuan Wei, Siqi Ma, Zicheng Liu, Stan Z. Li

    Abstract: Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-r… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Under review

  12. arXiv:2406.03880  [pdf, other

    cs.LG cs.AI

    Memorization in deep learning: A survey

    Authors: Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Ming Ding, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

    Abstract: Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03799  [pdf

    cs.CV cs.AI

    Enhanced Semantic Segmentation Pipeline for WeatherProof Dataset Challenge

    Authors: Nan Zhang, Xidan Zhang, Jianing Wei, Fangjun Wang, Zhiming Tan

    Abstract: This report describes the winning solution to the WeatherProof Dataset Challenge (CVPR 2024 UG2+ Track 3). Details regarding the challenge are available at https://cvpr2024ug2challenge.github.io/track3.html. We propose an enhanced semantic segmentation pipeline for this challenge. Firstly, we improve semantic segmentation models, using backbone pretrained with Depth Anything to improve UperNet mod… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  14. arXiv:2406.00993  [pdf

    eess.SP cs.HC q-bio.OT

    Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology

    Authors: Jiaming Wei, Tong Liu, Jipeng Huang, Xiaowei Li, Yurui Qi, Gangyin Luo

    Abstract: With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for dia… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, 14 figures

  15. arXiv:2405.20834  [pdf, other

    cs.CV

    Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning

    Authors: Cheng Tan, **gxuan Wei, Linzhuang Sun, Zhangyang Gao, Siyuan Li, Bihui Yu, Ruifeng Guo, Stan Z. Li

    Abstract: Large language models equipped with retrieval-augmented generation (RAG) represent a burgeoning field aimed at enhancing answering capabilities by leveraging external knowledge bases. Although the application of RAG with language-only models has been extensively explored, its adaptation into multimodal vision-language models remains nascent. Going beyond mere answer generation, the primary goal of… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Under review

  16. arXiv:2405.19592  [pdf, other

    cs.LG cs.AI cs.CL

    Why Larger Language Models Do In-context Learning Differently?

    Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

    Abstract: Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  17. arXiv:2405.19266  [pdf, other

    cs.CL

    PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications

    Authors: Dingkang Yang, **jie Wei, Dongling Xiao, Shunli Wang, Tong Wu, Gang Li, Mingcheng Li, Shuaibing Wang, Jiawei Chen, Yue Jiang, Qingyao Xu, Ke Li, Peng Zhai, Lihua Zhang

    Abstract: Develo** intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the… ▽ More

    Submitted 3 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: A Technical Report on a Chinese Medical Large Language Model

  18. arXiv:2405.16849  [pdf, other

    cs.CV

    Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation

    Authors: Zhoujie Fu, Jiacheng Wei, Wenhao Shen, Chaoyue Song, Xiaofeng Yang, Fayao Liu, Xulei Yang, Guosheng Lin

    Abstract: In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shap… ▽ More

    Submitted 6 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project page: https://sync4dphys.github.io/

  19. arXiv:2405.16191  [pdf, other

    cs.AI

    Rocket Landing Control with Grid Fins and Path-following using MPC

    Authors: Junhao Yu, Jiarun Wei

    Abstract: In this project, we attempt to optimize a landing trajectory of a rocket. The goal is to minimize the total fuel consumption during the landing process using different techniques. Once the optimal and feasible trajectory is generated using batch approach, we attempt to follow the path using a Model Predictive Control (MPC) based algorithm, called Trajectory Optimizing Path following Estimation fro… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  20. arXiv:2405.15995  [pdf, other

    cs.CV

    Efficient Temporal Action Segmentation via Boundary-aware Query Voting

    Authors: Peiyao Wang, Yuewei Lin, Erik Blasch, Jie Wei, Haibin Ling

    Abstract: Although the performance of Temporal Action Segmentation (TAS) has improved in recent years, achieving promising results often comes with a high computational cost due to dense inputs, complex model structures, and resource-intensive post-processing requirements. To improve the efficiency while kee** the performance, we present a novel perspective centered on per-segment classification. By harne… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures, 11 tables

  21. arXiv:2405.12571  [pdf, other

    cs.RO

    iHERO: Interactive Human-oriented Exploration and Supervision Under Scarce Communication

    Authors: Zhuoli Tian, Yuyang Zhang, **sheng Wei, Meng Guo

    Abstract: Exploration of unknown scenes before human entry is essential for safety and efficiency in numerous scenarios, e.g., subterranean exploration, reconnaissance, search and rescue missions. Fleets of autonomous robots are particularly suitable for this task, via concurrent exploration, multi-sensory perception and autonomous navigation. Communication however among the robots can be severely restricte… ▽ More

    Submitted 7 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted at RSS 2024

  22. arXiv:2405.12031  [pdf, other

    cs.SD eess.AS

    Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification

    Authors: Nian Li, Jianguo Wei

    Abstract: Transformer-based architectures for speaker verification typically require more training data than ECAPA-TDNN. Therefore, recent work has generally been trained on VoxCeleb1&2. We propose a backbone network based on self-attention, which can achieve competitive results when trained on VoxCeleb2 alone. The network alternates between neighborhood attention and global attention to capture local and g… ▽ More

    Submitted 29 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 8 pages, 2 figures, 3 tables; added github link

  23. Treatment Effect Estimation for User Interest Exploration on Recommender Systems

    Authors: Jiaju Chen, Wenjie Wang, Chongming Gao, Peng Wu, Jianxiong Wei, Qingsong Hua

    Abstract: Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024

  24. arXiv:2405.06841  [pdf, other

    cs.CV cs.LG

    Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis

    Authors: Guanyu Hu, Eleni Papadopoulou, Dimitrios Kollias, Paraskevi Tzouveli, Jie Wei, Xinyu Yang

    Abstract: The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment. As these technologies play a pivotal role in decision-making, addressing biases across diverse subpopulation groups, including age, gender, and race, becomes paramount. Automatic affect analysis, at the intersection of physiology, psychology, and machin… ▽ More

    Submitted 16 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: accepted at IEEE FG 2024

  25. arXiv:2405.00145  [pdf, other

    cs.SE cs.CV

    GUing: A Mobile GUI Search Engine using a Vision-Language Model

    Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, GĂ©rard Dray, Walid Maalej

    Abstract: App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual infor… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  26. arXiv:2404.19108  [pdf, other

    cs.CV astro-ph.IM eess.IV

    Real-Time Convolutional Neural Network-Based Star Detection and Centroiding Method for CubeSat Star Tracker

    Authors: Hongrui Zhao, Michael F. Lembeck, Adrian Zhuang, Riya Shah, Jesse Wei

    Abstract: Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  27. arXiv:2404.18688  [pdf, other

    cs.IT

    Distributed Source Coding for Parametric and Non-Parametric Regression

    Authors: Jiahui Wei, Elsa Dupraz, Philippe Mary

    Abstract: The design of communication systems dedicated to machine learning tasks is one key aspect of goal-oriented communications. In this framework, this article investigates the interplay between data reconstruction and learning from the same compressed observations, particularly focusing on the regression problem. We establish achievable rate-generalization error regions for both parametric and non-par… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  28. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, Yi** Bao, Xiao Liu, Dohyeong Kim, **seong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, **qiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  29. arXiv:2404.16385  [pdf, other

    cs.CV

    Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

    Authors: Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, **jie Wei, Xiaolu Hou, Lihua Zhang

    Abstract: In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  30. arXiv:2404.14827  [pdf, other

    cs.CL

    Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

    Authors: **gxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo

    Abstract: Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with t… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  31. arXiv:2404.14676  [pdf, other

    cs.CV cs.GR

    DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

    Authors: Linxuan Xin, Zheng Zhang, **fu Wei, Wei Gao, Duan Gao

    Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by… ▽ More

    Submitted 1 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 16 pages, 17 figures

    ACM Class: I.3.0; I.4.9

  32. arXiv:2404.11151  [pdf, other

    cs.CV

    REACTO: Reconstructing Articulated Objects from a Single Video

    Authors: Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

    Abstract: In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Qu… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  33. arXiv:2404.10352  [pdf, other

    cs.HC

    CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout

    Authors: Jiafu Wei, Chia-Ming Chang, Xi Yang, Takeo Igarashi

    Abstract: In real-world usage, existing GAN image generation tools come up short due to their lack of intuitive interfaces and limited flexibility. To overcome these limitations, we developed CanvasPic, an innovative tool for flexible GAN image generation. Our tool introduces a novel 2D layout design that allows users to intuitively control image attributes based on real-world images. By interacting with th… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  34. arXiv:2404.07503  [pdf, other

    cs.CL

    Best Practices and Lessons Learned on Synthetic Data for Language Models

    Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, **meng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

    Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  35. arXiv:2404.06836  [pdf, other

    cs.CV

    O2V-Map**: Online Open-Vocabulary Map** with Neural Implicit Representation

    Authors: Muer Tie, Julong Wei, Zhengjun Wang, Ke Wu, Shansuai Yuan, Kaizhao Zhang, Jie Jia, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive map**. However, implementing open-vocabulary scene understanding capability into online neural implicit map** still faces three challenges: lac… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  36. arXiv:2404.01548  [pdf, other

    cs.CV cs.AI

    mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

    Authors: **gxuan Wei, Nan Xu, Guiyong Chang, Yin Luo, BiHui Yu, Ruifeng Guo

    Abstract: In the fields of computer vision and natural language processing, multimodal chart question-answering, especially involving color, structure, and textless charts, poses significant challenges. Traditional methods, which typically involve either direct multimodal processing or a table-to-text conversion followed by language model analysis, have limitations in effectively handling these complex scen… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  37. arXiv:2403.20168  [pdf, other

    eess.IV cs.CV

    Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation

    Authors: Chuan Huang, Jia Wei, Rui Li

    Abstract: Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has… ▽ More

    Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024

  38. arXiv:2403.20159  [pdf, other

    cs.CV

    HGS-Map**: Online Dense Map** Using Hybrid Gaussian Representation in Urban Scenes

    Authors: Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Shanshuai Yuan, Muer Tie, Julong Wei, Zijun Xu, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online dense map** of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in map** methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense map**.… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  39. arXiv:2403.19060  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry Workers

    Authors: Yuning Wu, Jiaying Wei, Jean Oh, Daniel Cardoso Llach

    Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a "work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workf… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  40. arXiv:2403.18802  [pdf, other

    cs.CL cs.AI cs.LG

    Long-form factuality in large language models

    Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

    Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  41. arXiv:2403.16519  [pdf, ps, other

    cs.SC

    Two Algorithms for Computing Rational Univariate Representations of Zero-Dimensional Ideals with Parameters

    Authors: Dingkang Wang, **g**g Wei, Fanghui Xiao, Xiaopeng Zheng

    Abstract: Two algorithms for computing the rational univariate representation of zero-dimensional ideals with parameters are presented in the paper. Different from the rational univariate representation of zero-dimensional ideals without parameters, the number of zeros of zero-dimensional ideals with parameters under various specializations is different, which leads to choosing and checking the separating e… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  42. arXiv:2403.16516  [pdf, other

    cs.CL cs.CV

    Visually Guided Generative Text-Layout Pre-training for Document Intelligence

    Authors: Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e.g., locations of texts and table-cells). To this end, we propose visually guided generative text-layout pre-training, named ViTLP. Given a document image, the model optimizes hier… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 main conference. The first version of this paper was submitted to OpenReview (https://openreview.net/forum?id=ARtBIBAmNR) in June 2023

  43. arXiv:2403.15766  [pdf, other

    cs.LG cs.AI

    BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion

    Authors: Jia Wei, Xingjun Zhang, Witold Pedrycz

    Abstract: Bagging has achieved great success in the field of machine learning by integrating multiple base classifiers to build a single strong classifier to reduce model variance. The performance improvement of bagging mainly relies on the number and diversity of base classifiers. However, traditional deep learning model training methods are expensive to train individually and difficult to train multiple m… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  44. arXiv:2403.10047  [pdf, other

    cs.CV

    TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

    Authors: Jiahao Lyu, ** Wei, Gangyan Zeng, Zeng Li, Enze Xie, Wei Wang, Yu Zhou

    Abstract: Existing scene text spotters are designed to locate and transcribe texts from images. However, it is challenging for a spotter to achieve precise detection and recognition of scene texts simultaneously. Inspired by the glimpse-focus spotting pipeline of human beings and impressive performances of Pre-trained Language Models (PLMs) on visual tasks, we ask: 1) "Can machines spot texts without precis… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 12 pages, 8 figures

  45. arXiv:2403.08630  [pdf, other

    stat.ME cs.LG

    Leveraging Non-Decimated Wavelet Packet Features and Transformer Models for Time Series Forecasting

    Authors: Guy P Nason, James L. Wei

    Abstract: This article combines wavelet analysis techniques with machine learning methods for univariate time series forecasting, focusing on three main contributions. Firstly, we consider the use of Daubechies wavelets with different numbers of vanishing moments as input features to both non-temporal and temporal forecasting methods, by selecting these numbers during the cross-validation phase. Secondly, w… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    MSC Class: 62M10; 62M45

  46. arXiv:2403.06407  [pdf, other

    cs.CV

    Can LLMs' Tuning Methods Work in Medical Multimodal Domain?

    Authors: Jiawei Chen, Yue Jiang, Dingkang Yang, Mingcheng Li, **jie Wei, Ziyun Qian, Lihua Zhang

    Abstract: While large language models (LLMs) excel in world knowledge understanding, adapting them to specific subfields requires precise adjustments. Due to the model's vast scale, traditional global fine-tuning methods for large models can be computationally expensive and impact generalization. To address this challenge, a range of innovative Parameters-Efficient Fine-Tuning (PEFT) methods have emerged an… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  47. arXiv:2403.05704  [pdf, other

    econ.EM cs.SI stat.AP stat.ME

    Non-robustness of diffusion estimates on networks with measurement error

    Authors: Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

    Abstract: Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small,… ▽ More

    Submitted 11 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  48. arXiv:2403.05025  [pdf, other

    cs.AI

    Towards Multimodal Human Intention Understanding Debiasing via Subject-Deconfounding

    Authors: Dingkang Yang, Dongling Xiao, Ke Li, Yuzheng Wang, Zhaoyu Chen, **jie Wei, Lihua Zhang

    Abstract: Multimodal intention understanding (MIU) is an indispensable component of human expression analysis (e.g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviors. Existing works invariably focus on designing sophisticated structures or fusion strategies to achieve impressive improvements. Unfortunately, they all suffer from the sub… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 14 pages

  49. arXiv:2403.01756  [pdf, other

    cs.CV

    Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

    Authors: Yutian Liu, Wenjun Ke, Jianguo Wei

    Abstract: Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  50. arXiv:2402.15017  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

    Authors: Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

    Abstract: Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Published at ICLR 2024. 54 pages