Skip to main content

Showing 1–50 of 150 results for author: Miao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00072  [pdf, other

    cs.IR cs.CL

    Pistis-RAG: A Scalable Cascading Framework Towards Trustworthy Retrieval-Augmented Generation

    Authors: Yu Bai, Yukai Miao, Li Chen, Dan Li, Yanyu Ren, Hongtao Xie, Ce Yang, Xuhui Cai

    Abstract: In Greek mythology, Pistis symbolized good faith, trust, and reliability, echoing the core principles of RAG in LLM systems. Pistis-RAG, a scalable multi-stage framework, effectively addresses the challenges of large-scale retrieval-augmented generation (RAG). Each stage plays a distinct role: matching refines the search space, pre-ranking prioritizes semantically relevant documents, and ranking a… ▽ More

    Submitted 21 June, 2024; originally announced July 2024.

  2. arXiv:2406.13233  [pdf, other

    cs.AI

    AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models

    Authors: Zihao Zeng, Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

    Abstract: Mixture of experts (MoE) has become the standard for constructing production-level large language models (LLMs) due to its promise to boost model capacity without causing significant overheads. Nevertheless, existing MoE methods usually enforce a constant top-k routing for all tokens, which is arguably restrictive because various tokens (e.g., "<EOS>" vs. "apple") may require various numbers of ex… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  3. arXiv:2406.11519  [pdf, other

    cs.CV eess.IV

    HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

    Authors: Di Wang, Meiqi Hu, Yao **, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi Han, Naoto Yokoya, **g Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models (FMs) are revolutionizing the analysis and understanding of remote sensing (RS) scenes, including aerial RGB, multispectral, and SAR images. However, hyperspectral images (HSIs), which are rich in spectral information, have not seen much application of FMs, with existing methods often restricted to specific tasks and lacking generality. To fill this gap, we introduce HyperSIGMA,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The code and models will be released at https://github.com/WHU-Sigma/HyperSIGMA

  4. arXiv:2406.07327  [pdf, other

    cs.AI cs.CL cs.LG

    3D-Properties: Identifying Challenges in DPO and Charting a Path Forward

    Authors: Yuzi Yan, Yibo Miao, Jialian Li, Yipin Zhang, Jian Xie, Zhijie Deng, Dong Yan

    Abstract: Aligning large language models (LLMs) with human preference has recently gained tremendous attention, with the canonical yet costly RLHF-PPO and the simple and straightforward Direct Preference Optimization (DPO) as two examples. Despite the efficiency, DPO has rarely be used in the state-of-the-art production-level LLMs, implying its potential pathologies. In this work, we revisit DPO with a comp… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.00588  [pdf, other

    cs.LG cs.CR math.ST

    Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

    Authors: Lijia Yu, Shuang Liu, Yibo Miao, Xiao-Shan Gao, Lijun Zhang

    Abstract: The generalization bound is a crucial theoretical tool for assessing the generalizability of learning methods and there exist vast literatures on generalizability of normal learning, adversarial learning, and data poisoning. Unlike other data poison attacks, the backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set and the purpo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.19098  [pdf, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

    Authors: Shuyu Cheng, Yibo Miao, Yinpeng Dong, Xiao Yang, Xiao-Shan Gao, Jun Zhu

    Abstract: This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradie… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  7. arXiv:2405.18524  [pdf, other

    cs.CV

    Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures

    Authors: Hongjun Wu, Li Xiao, Xingkuo Zhang, Yining Miao

    Abstract: Knowledge distillation is commonly employed to compress neural networks, reducing the inference costs and memory footprint. In the scenario of homogenous architecture, feature-based methods have been widely validated for their effectiveness. However, in scenarios where the teacher and student models are of heterogeneous architectures, the inherent differences in feature representation significantl… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 12 pages, 3 figures, conference paper

  8. arXiv:2405.15130  [pdf, other

    cs.SE cs.CL cs.LG

    OptLLM: Optimal Assignment of Queries to Large Language Models

    Authors: Yueyue Liu, Hongyu Zhang, Yuantian Miao, Van-Hoang Le, Zhiqiang Li

    Abstract: Large Language Models (LLMs) have garnered considerable attention owing to their remarkable capabilities, leading to an increasing number of companies offering LLMs as services. Different LLMs achieve different performance at different costs. A challenge for users lies in choosing the LLMs that best fit their needs, balancing cost and performance. In this paper, we propose a framework for addressi… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: This paper is accepted by ICWS 2024

  9. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  10. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu **, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huan**g Yue, **gyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  11. arXiv:2404.03037  [pdf, other

    cs.LG cs.AI

    Model-based Reinforcement Learning for Parameterized Action Spaces

    Authors: Renhao Zhang, Haotian Fu, Yilin Miao, George Konidaris

    Abstract: We propose a novel model-based reinforcement learning algorithm -- Dynamics Learning and predictive control with Parameterized Actions (DLPA) -- for Parameterized Action Markov Decision Processes (PAMDPs). The agent learns a parameterized-action-conditioned dynamics model and plans with a modified Model Predictive Path Integral control. We theoretically quantify the difference between the generate… ▽ More

    Submitted 23 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  12. arXiv:2404.00469  [pdf, other

    cs.CV

    SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

    Authors: Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth

    Abstract: We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases.… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  13. arXiv:2404.00312  [pdf, other

    cs.CV cs.AI

    Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

    Authors: Yibo Miao, Yu Lei, Feng Zhou, Zhijie Deng

    Abstract: Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based methods lack the flexibility to effectively incorporate other pre-trained models that encompass knowledge distinct from CLIP. To bridge the gap, this work proposes… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  14. arXiv:2403.12760  [pdf, other

    cs.CV

    WaveFace: Authentic Face Restoration with Efficient Frequency Recovery

    Authors: Yunqi Miao, Jiankang Deng, Jungong Han

    Abstract: Although diffusion models are rising as a powerful solution for blind face restoration, they are criticized for two problems: 1) slow training and inference speed, and 2) failure in preserving identity and recovering fine-grained facial details. In this work, we propose WaveFace to solve the problems in the frequency domain, where low- and high-frequency components decomposed by wavelet transforma… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  15. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  16. arXiv:2403.04164  [pdf, ps, other

    cs.CV cs.AI

    ProMISe: Promptable Medical Image Segmentation using SAM

    Authors: **feng Wang, Sifan Song, Xinkun Wang, Yiyi Wang, Yiyi Miao, Jionglong Su, S. Kevin Zhou

    Abstract: With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and the significant domain gap between natural and medical images, fine-tuning-based strategies are costly with potential risk of instability, feature damage and catastrophic forgetting. Furthermore, some methods of transfer… ▽ More

    Submitted 18 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  17. arXiv:2403.02558  [pdf

    cs.CL cs.CV

    Updating the Minimum Information about CLinical Artificial Intelligence (MI-CLAIM) checklist for generative modeling research

    Authors: Brenda Y. Miao, Irene Y. Chen, Christopher YK Williams, Jaysón Davidson, Augusto Garcia-Agundez, Harry Sun, Travis Zack, Atul J. Butte, Madhumita Sushil

    Abstract: Recent advances in generative models, including large language models (LLMs), vision language models (VLMs), and diffusion models, have accelerated the field of natural language and image processing in medicine and marked a significant paradigm shift in how biomedical models can be developed and deployed. While these models are highly adaptable to new tasks, scaling and evaluating their usage pres… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  18. arXiv:2402.15813  [pdf, other

    cs.CL cs.GT

    Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method

    Authors: Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang

    Abstract: Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It al… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings. The dataset AmazonHistoryPrice and our code are available at https://github.com/TianXiaSJTU/AmazonPriceHistory

  19. arXiv:2402.09345  [pdf, other

    cs.LG cs.AI

    InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling

    Authors: Yuchun Miao, Sen Zhang, Liang Ding, Rong Bao, Lefei Zhang, Dacheng Tao

    Abstract: Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models with human values, reward hacking, also termed reward overoptimization, remains a critical challenge. This issue primarily arises from reward misgeneralization, where reward models (RMs) compute reward using spurious features that are irrelevant to human preferences. In this work, we tackle this pr… ▽ More

    Submitted 23 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 35 pages, 28 figures

  20. arXiv:2402.05821  [pdf, other

    cs.LG cs.NE

    Guided Evolution with Binary Discriminators for ML Program Search

    Authors: John D. Co-Reyes, Yingjie Miao, George Tucker, Aleksandra Faust, Esteban Real

    Abstract: How to automatically design better machine learning programs is an open problem within AutoML. While evolution has been a popular tool to search for better ML programs, using learning itself to guide the search has been less successful and less understood on harder problems but has the promise to dramatically increase the speed and final performance of the optimization process. We propose guiding… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  21. arXiv:2402.03597  [pdf

    cs.CL cs.IR cs.LG

    Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models

    Authors: Brenda Y. Miao, Christopher YK Williams, Ebenezer Chinedu-Eneh, Travis Zack, Emily Alsentzer, Atul J. Butte, Irene Y. Chen

    Abstract: Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  22. arXiv:2401.05568  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Phase discovery with active learning: Application to structural phase transitions in equiatomic NiTi

    Authors: Jonathan Vandermause, Anders Johansson, Yucong Miao, Joost J. Vlassak, Boris Kozinsky

    Abstract: Nickel titanium (NiTi) is a protypical shape-memory alloy used in a range of biomedical and engineering devices, but direct molecular dynamics simulations of the martensitic B19' -> B2 phase transition driving its shape-memory behavior are rare and have relied on classical force fields with limited accuracy. Here, we train four machine-learned force fields for equiatomic NiTi based on the LDA, PBE… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  23. arXiv:2401.00434  [pdf, other

    cs.CL

    GeoGalactica: A Scientific Large Language Model in Geoscience

    Authors: Zhouhan Lin, Cheng Deng, Le Zhou, Tianhang Zhang, Yi Xu, Yutong Xu, Zhongmou He, Yuanyuan Shi, Beiya Dai, Yunchong Song, Boyi Zeng, Qiyuan Chen, Yuxun Miao, Bo Xue, Shu Wang, Luoyi Fu, Weinan Zhang, Junxian He, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou

    Abstract: Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utili… ▽ More

    Submitted 13 April, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    ACM Class: I.2.7; F.4.1

  24. arXiv:2312.12803  [pdf, other

    cs.IT

    Repairing Schemes for Tamo-Barg Codes

    Authors: Han Cai, Ying Miao, Moshe Schwartz, Xiaohu Tang

    Abstract: In this paper, we explore a practical system setting where a rack-aware storage system consists of racks, each containing a few parity checks, referred to as a rack-aware system with locality. To minimize cross-rack bandwidth in this system, we organize the repair sets of locally repairable codes into racks and investigate the problem of repairing erasures in locally repairable codes beyond the co… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  25. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2312.00413  [pdf, other

    cs.SE cs.AI cs.CL cs.PL

    Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

    Authors: Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen

    Abstract: Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of the source code features while preserving its semantics. These representations can be used for facilitating subsequent code-related tasks. The abstract syntax… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: submitted to ACM Transactions on Software Engineering and Methodology. arXiv admin note: text overlap with arXiv:2103.10668 by other authors

    MSC Class: 68-04; 68T30 ACM Class: D.2.3; I.2.2; I.2.4

  27. arXiv:2311.15269  [pdf, other

    cs.DC cs.AI

    Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search

    Authors: Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang

    Abstract: Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcraft… ▽ More

    Submitted 26 November, 2023; originally announced November 2023.

    Comments: The paper is accepted by HPCA 2024

  28. arXiv:2311.12592  [pdf, other

    cs.HC cs.AI eess.SY

    Visual tracking brain computer interface

    Authors: Changxing Huang, Nanlin Shi, Yining Miao, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: Brain-computer interfaces (BCIs) offer a way to interact with computers without relying on physical movements. Non-invasive electroencephalography (EEG)-based visual BCIs, known for efficient speed and calibration ease, face limitations in continuous tasks due to discrete stimulus design and decoding methods. To achieve continuous control, we implemented a novel spatial encoding stimulus paradigm… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  29. arXiv:2311.11596  [pdf

    cs.HC cs.IT eess.SP q-bio.NC

    High-performance cVEP-BCI under minimal calibration

    Authors: Yining Miao, Nanlin Shi, Changxing Huang, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 35 pages, 5 figures

  30. arXiv:2311.07608  [pdf, other

    cs.LG cs.AI

    MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction

    Authors: Yan Miao, Lequan Yu

    Abstract: Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studi… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  31. arXiv:2311.06517  [pdf, other

    cs.AI cs.DB cs.LG stat.AP

    BClean: A Bayesian Data Cleaning System

    Authors: Jianbin Qin, Sifan Huang, Yaoshu Wang, **g Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao

    Abstract: There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice,… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    Comments: Our source code is available at https://github.com/yyssl88/BClean

  32. arXiv:2309.14737  [pdf, other

    cs.RO cs.CV

    Volumetric Semantically Consistent 3D Panoptic Map**

    Authors: Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath

    Abstract: We introduce an online 2D-to-3D semantic instance map** algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during map**, producing semantic… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: 8 pages, 2 figures

  33. arXiv:2309.10895  [pdf, ps, other

    cs.HC cs.MA

    Large Language Models as Agents in the Clinic

    Authors: Nikita Mehandru, Brenda Y. Miao, Eduardo Rodriguez Almaraz, Madhumita Sushil, Atul J. Butte, Ahmed Alaa

    Abstract: Recent developments in large language models (LLMs) have unlocked new opportunities for healthcare, from information synthesis to clinical decision support. These new LLMs are not just capable of modeling language, but can also act as intelligent "agents" that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: 4 pages

  34. arXiv:2309.05557  [pdf, other

    cs.CL cs.AI cs.NI

    An Empirical Study of NetOps Capability of Pre-Trained Large Language Models

    Authors: Yukai Miao, Yu Bai, Li Chen, Dan Li, Haifeng Sun, Xizheng Wang, Ziqiu Luo, Yanyu Ren, Dapeng Sun, Xiuting Xu, Qi Zhang, Chao Xiang, Xinchi Li

    Abstract: Nowadays, the versatile capabilities of Pre-trained Large Language Models (LLMs) have attracted much attention from the industry. However, some vertical domains are more interested in the in-domain capabilities of LLMs. For the Networks domain, we present NetEval, an evaluation set for measuring the comprehensive capabilities of LLMs in Network Operations (NetOps). NetEval is designed for evaluati… ▽ More

    Submitted 19 September, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

  35. arXiv:2309.05028  [pdf, other

    cs.CV

    SC-NeRF: Self-Correcting Neural Radiance Field with Sparse Views

    Authors: Liang Song, Guangming Wang, Jiuming Liu, Zhenyang Fu, Yanzi Miao, Hesheng

    Abstract: In recent studies, the generalization of neural radiance fields for novel view synthesis task has been widely explored. However, existing methods are limited to objects and indoor scenes. In this work, we extend the generalization task to outdoor scenes, trained only on object-level datasets. This approach presents two challenges. Firstly, the significant distributional shift between training and… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  36. arXiv:2308.13232  [pdf, other

    cs.HC cs.IT eess.SP q-bio.NC

    Estimating and approaching maximum information rate of noninvasive visual brain-computer interface

    Authors: Nanlin Shi, Yining Miao, Changxing Huang, Xiang Li, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: The mission of visual brain-computer interfaces (BCIs) is to enhance information transfer rate (ITR) to reach high speed towards real-life communication. Despite notable progress, noninvasive visual BCIs have encountered a plateau in ITRs, leaving it uncertain whether higher ITRs are achievable. In this study, we investigate the information rate limits of the primary visual channel to explore whet… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  37. CORAL: Expert-Curated medical Oncology Reports to Advance Language Model Inference

    Authors: Madhumita Sushil, Vanessa E. Kennedy, Divneet Mandair, Brenda Y. Miao, Travis Zack, Atul J. Butte

    Abstract: Both medical care and observational studies in oncology require a thorough understanding of a patient's disease progression and treatment history, often elaborately documented in clinical notes. Despite their vital role, no current oncology information representation and annotation schema fully encapsulates the diversity of information recorded within these notes. Although large language models (L… ▽ More

    Submitted 11 January, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Source code available at: https://github.com/MadhumitaSushil/OncLLMExtraction

  38. arXiv:2308.01857  [pdf, other

    cs.AR

    iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

    Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

    Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  39. arXiv:2307.00534  [pdf, other

    cs.LG

    Shared Growth of Graph Neural Networks via Prompted Free-direction Knowledge Distillation

    Authors: Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang

    Abstract: Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer i… ▽ More

    Submitted 16 November, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2206.06561

  40. arXiv:2306.12113  [pdf, other

    cs.CV cs.AI

    Lightweight wood panel defect detection method incorporating attention mechanism and feature fusion network

    Authors: Yongxin Cao, Fanghua Liu, Lai Jiang, Cheng Bao, You Miao, Yang Chen

    Abstract: In recent years, deep learning has made significant progress in wood panel defect detection. However, there are still challenges such as low detection , slow detection speed, and difficulties in deploying embedded devices on wood panel surfaces. To overcome these issues, we propose a lightweight wood panel defect detection method called YOLOv5-LW, which incorporates attention mechanisms and a feat… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  41. arXiv:2306.11400  [pdf, other

    cs.CV cs.CL

    MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

    Authors: Yongzhu Miao, Shasha Li, **tao Tang, Ting Wang

    Abstract: Prompt tuning, like CoOp, has recently shown promising vision recognizing and transfer learning ability on various downstream tasks with the emergence of large pre-trained vision-language models like CLIP. However, we identify that existing uni-modal prompt tuning approaches may result in sub-optimal performance since this uni-modal design breaks the original alignment of textual and visual repres… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: The paper has been accepted by ICME 2023

  42. arXiv:2306.09792  [pdf, other

    cs.LG cs.CE physics.comp-ph

    GPINN: Physics-informed Neural Network with Graph Embedding

    Authors: Yuyang Miao, Haolin Li

    Abstract: This work proposes a Physics-informed Neural Network framework with Graph Embedding (GPINN) to perform PINN in graph, i.e. topological space instead of traditional Euclidean space, for improved problem-solving efficiency. The method integrates topological data into the neural network's computations, which significantly boosts the performance of the Physics-Informed Neural Network (PINN). The graph… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  43. arXiv:2306.08423  [pdf, other

    cs.DC

    DistSim: A performance model of large-scale hybrid distributed DNN training

    Authors: Guandong Lu, Runzhe Chen, Yakai Wang, Yangjie Zhou, Rui Zhang, Zheng Hu, Yanming Miao, Zhifang Cai, Li Li, **gwen Leng, Minyi Guo

    Abstract: With the ever-increasing computational demand of DNN training workloads, distributed training has been widely adopted. A combination of data, model and pipeline parallelism strategy, called hybrid parallelism distributed training, is imported to tackle the problem of deploying large-scale models. However, how to evaluate the hybrid strategy and the utilization of each device remains a challenge si… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  44. arXiv:2306.04362  [pdf, other

    cs.CV cs.CL

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

    Authors: Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

    Abstract: To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known Chinese video-sharing website, with strict criteria of safety, diversity, and quality. Youku-mPLUG contains 10 million Chi… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Working in progress

  45. arXiv:2306.04240  [pdf, other

    cs.CV math.NA

    T-ADAF: Adaptive Data Augmentation Framework for Image Classification Network based on Tensor T-product Operator

    Authors: Feiyang Han, Yun Miao, Zhaoyi Sun, Yimin Wei

    Abstract: Image classification is one of the most fundamental tasks in Computer Vision. In practical applications, the datasets are usually not as abundant as those in the laboratory and simulation, which is always called as Data Hungry. How to extract the information of data more completely and effectively is very important. Therefore, an Adaptive Data Augmentation Framework based on the tensor T-product O… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  46. arXiv:2305.19982  [pdf, other

    cs.LG cs.AI

    Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training

    Authors: Yijia Zhang, Yibo Han, Shijie Cao, Guohao Dai, Youshan Miao, Ting Cao, Fan Yang, Ningyi Xu

    Abstract: Running out of GPU memory has become a main bottleneck for large-scale DNN training. How to reduce the memory footprint during training has received intensive research attention. We find that previous gradient accumulation reduces activation memory but fails to be compatible with gradient memory reduction due to a contradiction between preserving gradients and releasing gradients. To address this… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

  47. arXiv:2305.16617  [pdf, other

    cs.LG cs.AI cs.CL

    Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

    Authors: Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

    Abstract: The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection… ▽ More

    Submitted 4 June, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

  48. arXiv:2305.14062  [pdf, other

    eess.SP cs.LG

    Amplitude-Independent Machine Learning for PPG through Visibility Graphs and Transfer Learning

    Authors: Yuyang Miao, Harry J. Davies, Danilo P. Mandic

    Abstract: Photoplethysmography (PPG) refers to the measurement of variations in blood volume using light and is a feature of most wearable devices. The PPG signals provide insight into the body's circulatory system and can be employed to extract various bio-features, such as heart rate and vascular ageing. Although several algorithms have been proposed for this purpose, many exhibit limitations, including h… ▽ More

    Submitted 16 January, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

  49. arXiv:2305.12865  [pdf, other

    cs.SE cs.AI

    Automatic Code Summarization via ChatGPT: How Far Are We?

    Authors: Weisong Sun, Chunrong Fang, Yudu You, Yun Miao, Yi Liu, Yuekang Li, Gelei Deng, Shenghan Huang, Yuchen Chen, Quanjun Zhang, Hanwei Qian, Yang Liu, Zhenyu Chen

    Abstract: To support software developers in understanding and maintaining programs, various automatic code summarization techniques have been proposed to generate a concise natural language comment for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of natural language processing tasks. Among them, ChatGPT is the most popular one whic… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    MSC Class: 68T50 ACM Class: D.2.3

  50. arXiv:2303.14562  [pdf, other

    cs.RO

    Resolution Complete In-Place Object Retrieval given Known Object Models

    Authors: Daniel Nakhimovich, Yinglong Miao, Kostas E. Bekris

    Abstract: This work proposes a robot task planning framework for retrieving a target object in a confined workspace among multiple stacked objects that obstruct the target. The robot can use prehensile picking and in-workspace placing actions. The method assumes access to 3D models for the visible objects in the scene. The key contribution is in achieving desirable properties, i.e., to provide (a) safety, b… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 7 pages, 4 figures, Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2023