Skip to main content

Showing 1–50 of 105 results for author: Hu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13210  [pdf, other

    cs.CV cs.AI

    Surgical Triplet Recognition via Diffusion Model

    Authors: Daochang Liu, Axel Hu, Mubarak Shah, Chang Xu

    Abstract: Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. The goal is to identify the combinations of instruments, verbs, and targets presented in surgical video frames. In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iter… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.11409  [pdf, other

    cs.CL cs.AI

    CodeGemma: Open Code Models Based on Gemma

    Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, **gyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

    Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

  3. arXiv:2406.08713  [pdf, other

    cs.AI cs.CV

    Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis

    Authors: Xinrui Yang, Zhuohan Wang, Anthony Hu

    Abstract: Text-to-image models have shown remarkable progress in generating high-quality images from user-provided prompts. Despite this, the quality of these images varies due to the models' sensitivity to human language nuances. With advancements in large language models, there are new opportunities to enhance prompt design for image generation tasks. Existing research primarily focuses on optimizing prom… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2405.18892  [pdf, other

    cs.IT eess.SP

    EVM Analysis of Distributed Massive MIMO with 1-Bit Radio-Over-Fiber Fronthaul

    Authors: Anzhong Hu, Lise Aabel, Giuseppe Durisi, Sven Jacobsson, Mikael Coldrey, Christian Fager, Christoph Studer

    Abstract: We analyze the uplink performance of a distributed massive multiple-input multiple-output (MIMO) architecture in which the remotely located access points (APs) are connected to a central processing unit via a fiber-optical fronthaul carrying a dithered and 1-bit quantized version of the received radio-frequency (RF) signal. The innovative feature of the proposed architecture is that no down-conver… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: To appear in IEEE Transactions on Communications

  5. arXiv:2405.00282  [pdf, ps, other

    math.OC cs.AI cs.GT cs.LG cs.MA

    MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games

    Authors: Anran Hu, Junzi Zhang

    Abstract: Reinforcement learning for multi-agent games has attracted lots of attention recently. However, given the challenge of solving Nash equilibria for large population games, existing works with guaranteed polynomial complexities either focus on variants of zero-sum and potential games, or aim at solving (coarse) correlated equilibria, or require access to simulators, or rely on certain assumptions th… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  6. arXiv:2404.18670  [pdf, other

    cs.LG stat.AP

    Enhancing Uncertain Demand Prediction in Hospitals Using Simple and Advanced Machine Learning

    Authors: Annie Hu, Samuel Stockman, Xun Wu, Richard Wood, Bangdong Zhi, Oliver Y. Chén

    Abstract: Early and timely prediction of patient care demand not only affects effective resource allocation but also influences clinical decision-making as well as patient experience. Accurately predicting patient care demand, however, is a ubiquitous challenge for hospitals across the world due, in part, to the demand's time-varying temporal variability, and, in part, to the difficulty in modelling trends… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  7. arXiv:2404.16635  [pdf, other

    cs.CV

    TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

    Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin **, Ji Zhang, Fei Huang

    Abstract: Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficien… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages, 11 figures

  8. arXiv:2404.14705  [pdf, other

    cs.CV

    Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

    Authors: Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin **

    Abstract: This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of levera… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.11844  [pdf, ps, other

    cs.CY

    Finding A Taxi with Illegal Driver Substitution Activity via Behavior Modelings

    Authors: Junbiao Pang, Muhammad Ayub Sabir, Zhuyun Wang, An**g Hu, Xue Yang, Haitao Yu, Qingming Huang

    Abstract: In our urban life, Illegal Driver Substitution (IDS) activity for a taxi is a grave unlawful activity in the taxi industry, possibly causing severe traffic accidents and painful social repercussions. Currently, the IDS activity is manually supervised by law enforcers, i.e., law enforcers empirically choose a taxi and inspect it. The pressing problem of this scheme is the dilemma between the limite… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  10. arXiv:2404.07545  [pdf, other

    cs.CV

    Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion

    Authors: Ang Li, Anning Hu, Wei Xi, Wenxian Yu, Dan** Zou

    Abstract: Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hin… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted in ICRA 2024. 8 pages, 6 figures

  11. arXiv:2403.12895  [pdf, other

    cs.CV

    mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

    Authors: Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin **, Fei Huang, **gren Zhou

    Abstract: Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure informatio… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 21 pages, 15 figures

  12. arXiv:2403.12693  [pdf, other

    cs.CV

    As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?

    Authors: Anjun Hu, **dong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr

    Abstract: Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerab… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  13. arXiv:2403.06284  [pdf

    cs.HC cs.ET

    Develo** an AI-Based Psychometric System for Assessing Learning Difficulties and Adaptive System to Overcome: A Qualitative and Conceptual Framework

    Authors: Aaron Hu

    Abstract: Learning difficulties pose significant challenges for students, impacting their academic performance and overall educational experience. These difficulties could sometimes put students into a downward spiral that lack of educational resources for personalized support consistently led to under-accommodation of students special needs, and the student lose opportunities in the longer term academic an… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: This is a working paper

  14. arXiv:2401.10703  [pdf, other

    cs.LO

    DRAT Proofs of Unsatisfiability for SAT Modulo Monotonic Theories

    Authors: Nick Feng, Alan J. Hu, Sam Bayless, Syed M. Iqbal, Patrick Trentin, Mike Whalen, Lee Pike, John Backes

    Abstract: Generating proofs of unsatisfiability is a valuable capability of most SAT solvers, and is an active area of research for SMT solvers. This paper introduces the first method to efficiently generate proofs of unsatisfiability specifically for an important subset of SMT: SAT Modulo Monotonic Theories (SMMT), which includes many useful finite-domain theories (e.g., bit vectors and many graph-theoreti… ▽ More

    Submitted 18 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  15. arXiv:2401.10314  [pdf, other

    cs.SE cs.AI cs.LG cs.RO

    LangProp: A code optimization framework using Large Language Models applied to driving

    Authors: Shu Ishida, Gianluca Corrado, George Fedoseev, Hudson Yeo, Lloyd Russell, Jamie Shotton, João F. Henriques, Anthony Hu

    Abstract: We propose LangProp, a framework for iteratively optimizing code generated by large language models (LLMs), in both supervised and reinforcement learning settings. While LLMs can generate sensible coding solutions zero-shot, they are often sub-optimal. Especially for code generation tasks, it is likely that the initial code will fail on certain edge cases. LangProp automatically evaluates the code… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  16. arXiv:2401.08433  [pdf, other

    cs.RO

    Autonomous Multiple-Trolley Collection System with Nonholonomic Robots: Design, Control, and Implementation

    Authors: Peijia Xie, Bingyi Xia, Anjun Hu, Ziqi Zhao, Lingxiao Meng, Zhirui Sun, Xuheng Gao, Jiankun Wang, Max Q. -H. Meng

    Abstract: The intricate and multi-stage task in dynamic public spaces like luggage trolley collection in airports presents both a promising opportunity and an ongoing challenge for automated service robots. Previous research has primarily focused on handling a single trolley or individual functional components, creating a gap in providing cost-effective and efficient solutions for practical scenarios. In th… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  17. arXiv:2401.03346  [pdf, ps, other

    cs.CY cs.AI cs.CL cs.LG cs.SI

    An Investigation of Large Language Models for Real-World Hate Speech Detection

    Authors: Keyan Guo, Alexander Hu, Jaden Mu, Ziheng Shi, Ziming Zhao, Nishant Vishwamitra, Hongxin Hu

    Abstract: Hate speech has emerged as a major problem plaguing our social spaces today. While there have been significant efforts to address this problem, existing methods are still significantly limited in effectively detecting hate speech online. A major limitation of existing methods is that hate speech detection is a highly contextual problem, and these methods cannot fully capture the context of hate sp… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: Accepted for publication on 22nd International Conference of Machine Learning and Applications, ICMLA 2023

  18. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  19. arXiv:2311.18248  [pdf, other

    cs.MM cs.CL

    mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

    Authors: Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

    Abstract: Recently, the strong text creation ability of Large Language Models(LLMs) has given rise to many tools for assisting paper reading or even writing. However, the weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their application scenarios, especially for scientific academic paper writing. In this work, towards a more versatile copilot for academic paper writing, we mainly fo… ▽ More

    Submitted 9 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 20 pages, 12 figures

  20. arXiv:2311.04257  [pdf, other

    cs.CL cs.CV

    mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

    Authors: Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks… ▽ More

    Submitted 8 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

  21. arXiv:2310.17626  [pdf, ps, other

    cs.CV

    A Survey on Transferability of Adversarial Examples across Deep Neural Networks

    Authors: **dong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

    Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models in… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR)

  22. arXiv:2310.10656  [pdf, other

    cs.CR cs.AI cs.LG

    VeriDIP: Verifying Ownership of Deep Neural Networks through Privacy Leakage Fingerprints

    Authors: Aoting Hu, Zhigang Lu, Renjie Xie, Minhui Xue

    Abstract: Deploying Machine Learning as a Service gives rise to model plagiarism, leading to copyright infringement. Ownership testing techniques are designed to identify model fingerprints for verifying plagiarism. However, previous works often rely on overfitting or robustness features as fingerprints, lacking theoretical guarantees and exhibiting under-performance on generalized models. In this paper, we… ▽ More

    Submitted 6 September, 2023; originally announced October 2023.

  23. arXiv:2310.10219  [pdf, other

    cs.CV cs.AI

    Using Global Land Cover Product as Prompt for Cropland Map** via Visual Foundation Model

    Authors: Chao Tao, Aoran Hu, Rong Xiao, Haifeng Li, Yuze Wang

    Abstract: Data-driven deep learning methods have shown great potential in cropland map**. However, due to multiple factors such as attributes of cropland (topography, climate, crop type) and imaging conditions (viewing angle, illumination, scale), croplands under different scenes demonstrate a great domain gap. This makes it difficult for models trained in the specific scenes to directly generalize to oth… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  24. arXiv:2310.05126  [pdf, other

    cs.CV cs.AI

    UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

    Authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin **, Liang He, Xin Alex Lin, Fei Huang

    Abstract: Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding based on the Multimodal Large Language Model (MLLM). By leveraging the shallow text recognition ability of the MLLM, we only finetuned 1.2% parameters and… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  25. arXiv:2309.17080  [pdf, other

    cs.CV cs.AI cs.RO

    GAIA-1: A Generative World Model for Autonomous Driving

    Authors: Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, Gianluca Corrado

    Abstract: Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle's actions as the world evolves. To address this challenge, we introduce GAIA… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Technical Report

  26. arXiv:2308.10447  [pdf, other

    cs.CV

    Explore and Tell: Embodied Visual Captioning in 3D Environments

    Authors: Anwen Hu, Shizhe Chen, Liang Zhang, Qin **

    Abstract: While current visual captioning models have achieved impressive performance, they often assume that the image is well-captured and provides a complete view of the scene. In real-world scenarios, however, a single image may not offer a good viewpoint, hindering fine-grained scene understanding. To overcome this limitation, we propose a novel task called Embodied Captioning, which equips visual capt… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 12 pages; 10 figures; ICCV 2023

  27. arXiv:2307.02499  [pdf, other

    cs.CL cs.AI

    mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

    Authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

    Abstract: Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 10 pages, 8 figures

  28. arXiv:2306.13460  [pdf, other

    cs.CL

    Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

    Authors: Zihao Yue, Anwen Hu, Liang Zhang, Qin **

    Abstract: Image captioning aims to describe visual content in natural language. As 'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics t… ▽ More

    Submitted 28 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to NeurIPS 2023

  29. arXiv:2306.09179  [pdf, other

    cs.CV cs.AI cs.RO

    Neural World Models for Computer Vision

    Authors: Anthony Hu

    Abstract: Humans navigate in their environment by learning a mental model of the world through passive observation and active interaction. Their world model allows them to anticipate what might happen next and act accordingly with respect to an underlying objective. Such world models hold strong promises for planning in complex environments like in autonomous driving. A human driver, or a self-driving syste… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: PhD thesis

  30. arXiv:2306.04362  [pdf, other

    cs.CV cs.CL

    Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

    Authors: Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

    Abstract: To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collected from Youku, a well-known Chinese video-sharing website, with strict criteria of safety, diversity, and quality. Youku-mPLUG contains 10 million Chi… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: Working in progress

  31. arXiv:2305.12817  [pdf, other

    cs.LG

    Conservative Physics-Informed Neural Networks for Non-Conservative Hyperbolic Conservation Laws Near Critical States

    Authors: Reyna Quita, Yu-Shuo Chen, Hsin-Yi Lee Alex C. Hu, John M. Hong

    Abstract: In this paper, a modified version of conservative Physics-informed Neural Networks (cPINN for short) is provided to construct the weak solutions of Riemann problem for the hyperbolic scalar conservation laws in non-conservative form. To demonstrate the results, we use the model of generalized Buckley-Leverett equation (GBL equation for short) with discontinuous porosity in porous media. By inventi… ▽ More

    Submitted 22 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 23 pages, 26 figures

    MSC Class: 35L03; 35L45; 65M99

  32. arXiv:2305.12140  [pdf, other

    cs.CV cs.MM

    Movie101: A New Movie Understanding Benchmark

    Authors: Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin **

    Abstract: To help the visually impaired enjoy movies, automatic movie narrating systems are expected to narrate accurate, coherent, and role-aware plots when there are no speaking lines of actors. Existing works benchmark this challenge as a normal video captioning task via some simplifications, such as removing role names and evaluating narrations with ngram-based metrics, which makes it difficult for auto… ▽ More

    Submitted 27 June, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  33. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  34. arXiv:2305.06002  [pdf, other

    cs.CV

    InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation

    Authors: Anwen Hu, Shizhe Chen, Liang Zhang, Qin **

    Abstract: Automatic image captioning evaluation is critical for benchmarking and promoting advances in image captioning research. Existing metrics only provide a single score to measure caption qualities, which are less explainable and informative. Instead, we humans can easily identify the problems of captions in details, e.g., which words are inaccurate and which salient objects are not described, and the… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 main conference

  35. arXiv:2305.00664  [pdf, other

    cs.LG

    EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs

    Authors: Haohui Wang, Yuzhen Mao, Yujun Yan, Yaoqing Yang, Jianhui Sun, Kevin Choi, Balaji Veeramani, Alison Hu, Edward Bowen, Tyler Cody, Dawei Zhou

    Abstract: Non-IID transfer learning on graphs is crucial in many high-stakes domains. The majority of existing works assume stationary distribution for both source and target domains. However, real-world graphs are intrinsically dynamic, presenting challenges in terms of domain evolution and dynamic discrepancy between source and target domains. To bridge the gap, we shift the problem to the dynamic setting… ▽ More

    Submitted 31 May, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Comments: Accepted at ICML 2024

  36. arXiv:2304.14178  [pdf, other

    cs.CL cs.CV cs.LG

    mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

    Authors: Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, **gren Zhou

    Abstract: Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel training paradigm that equips LLMs with multi-modal abilities through modularized learning of foundation LLM, a visual knowledge module, and a visual abstrac… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Working in Process

  37. arXiv:2304.09660  [pdf, other

    cs.CL

    MPMQA: Multimodal Question Answering on Product Manuals

    Authors: Liang Zhang, Anwen Hu, **g Zhang, Shuo Hu, Qin **

    Abstract: Visual contents, such as illustrations and images, play a big role in product manual understanding. Existing Product Manual Question Answering (PMQA) datasets tend to ignore visual contents and only retain textual parts. In this work, to emphasize the importance of multimodal contents, we propose a Multimodal Product Manual Question Answering (MPMQA) task. For each question, MPMQA requires the mod… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  38. arXiv:2304.08630  [pdf, ps, other

    cs.GT

    MFGLib: A Library for Mean-Field Games

    Authors: Xin Guo, Anran Hu, Matteo Santamaria, Mahan Tajrobehkar, Junzi Zhang

    Abstract: Mean-field games (MFGs) are limiting models to approximate $N$-player games, with a number of applications. Despite the ever-growing numerical literature on computation of MFGs, there is no library that allows researchers and practitioners to easily create and solve their own MFG problems. The purpose of this document is to introduce MFGLib, an open-source Python library for solving general MFGs w… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

  39. arXiv:2303.12455  [pdf, ps, other

    cs.IT

    Reconfigurable Intelligent Surface-aided Secret Key Generation in Multi-Cell Systems

    Authors: Lei Hu, Chen Sun, Guyue Li, Aiqun Hu, Derrick Wing Kwan Ng

    Abstract: Physical-layer key generation (PKG) exploits the reciprocity and randomness of wireless channels to generate a symmetric key between two legitimate communication ends. However, in multi-cell systems, PKG suffers from severe pilot contamination due to the reuse of pilots in different cells. In this paper, we invoke multiple reconfigurable intelligent surfaces (RISs) for adaptively sha** the envir… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 30 pages, 12 figures

  40. arXiv:2303.08947  [pdf, other

    cs.RO

    Design, Modeling, and Redundancy Resolution of Soft Robot for Effective Harvesting

    Authors: Milad Azizkhani, Anthony L. Gunderman, Alex S. Qiu, Ai-** Hu, Xin Zhang, Yue Chen

    Abstract: Blackberry harvesting is a labor-intensive and costly process, consuming up to 50\% of the total annual crop hours. This paper presents a solution for robotic harvesting through the design, manufacturing, integration, and control of a pneumatically actuated, kinematically redundant soft arm with a tendon-driven soft robotic gripper. The hardware design is optimized for durability and modularity fo… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 11 pages, 10 figures

  41. arXiv:2303.07015  [pdf, ps, other

    cs.IT

    RIS-Jamming: Breaking Key Consistency in Channel Reciprocity-based Key Generation

    Authors: Guyue Li, Paul Staat, Haoyu Li, Markus Heinrichs, Christian Zenger, Rainer Kronberger, Harald Elders-Boll, Christof Paar, Aiqun Hu

    Abstract: Channel Reciprocity-based Key Generation (CRKG) exploits reciprocal channel randomness to establish shared secret keys between wireless terminals. This new security technique is expected to complement existing cryptographic techniques for secret key distribution of future wireless networks. In this paper, we present a new attack, reconfigurable intelligent surface (RIS) jamming, and show that an a… ▽ More

    Submitted 10 April, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

    Comments: 15 pages, 14 figures

  42. arXiv:2303.06591  [pdf, other

    cs.CV

    Accommodating Audio Modality in CLIP for Multimodal Processing

    Authors: Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin **

    Abstract: Multimodal processing has attracted much attention lately especially with the success of pre-training. However, the exploration has mainly focused on vision-language pre-training, as introducing more modalities can greatly complicate model design and optimization. In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio mul… ▽ More

    Submitted 12 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI2023

  43. arXiv:2302.11000  [pdf, other

    cs.LG cs.AI q-bio.QM

    CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular Design

    Authors: Mohammad Sajjad Ghaemi, Hang Hu, Anguang Hu, Hsu Kiang Ooi

    Abstract: Optimizing molecular design and discovering novel chemical structures to meet certain objectives, such as quantitative estimates of the drug-likeness score (QEDs), is NP-hard due to the vast combinatorial design space of discrete molecular structures, which makes it near impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest. To addr… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  44. arXiv:2302.10952  [pdf

    cs.LG q-bio.BM

    Machine learning for the prediction of safe and biologically active organophosphorus molecules

    Authors: Hang Hu, Hsu Kiang Ooi, Mohammad Sajjad Ghaemi, Anguang Hu

    Abstract: Drug discovery is a complex process with a large molecular space to be considered. By constraining the search space, the fragment-based drug design is an approach that can effectively sample the chemical space of interest. Here we propose a framework of Recurrent Neural Networks (RNN) with an attention model to sample the chemical space of organophosphorus molecules using the fragment-based approa… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  45. arXiv:2302.03099  [pdf

    cs.RO

    Tendon-Driven Soft Robotic Gripper with Integrated Ripeness Sensing for Blackberry Harvesting

    Authors: Alex Qiu, Claire Young, Anthony Gunderman, Milad Azizkhani, Yue Chen, Ai-** Hu

    Abstract: Growing global demand for food, coupled with continuing labor shortages, motivates the need for automated agricultural harvesting. While some specialty crops (e.g., apples, peaches, blueberries) can be harvested via existing harvesting modalities, fruits such as blackberries and raspberries require delicate handling to mitigate fruit damage that could significantly impact marketability. This motiv… ▽ More

    Submitted 6 February, 2023; originally announced February 2023.

    Comments: 7 Pages, 9 figures, submitted to and accepted by ICRA 2023

  46. arXiv:2301.01453  [pdf, other

    cs.IT

    Information-Theoretic Secure Key Sharing for Wide-Area Mobile Applications

    Authors: Guyue Li, Hongyi Luo, Jiabao Yu, Aiqun Hu, Jiangzhou Wang

    Abstract: With the rapid growth of handheld devices in the internet of things (IoT) networks, mobile applications have become ubiquitous in everyday life. As technology is developed, so do also the risks and threats associated with it, especially in the forthcoming quantum era. Existing IoT networks, however, lack a quantum-resistant secret key sharing scheme to meet confidential message transmission demand… ▽ More

    Submitted 4 January, 2023; originally announced January 2023.

  47. arXiv:2212.14514  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    The Voronoigram: Minimax Estimation of Bounded Variation Functions From Scattered Data

    Authors: Addison J. Hu, Alden Green, Ryan J. Tibshirani

    Abstract: We consider the problem of estimating a multivariate function $f_0$ of bounded variation (BV), from noisy observations $y_i = f_0(x_i) + z_i$ made at random design points $x_i \in \mathbb{R}^d$, $i=1,\ldots,n$. We study an estimator that forms the Voronoi diagram of the design points, and then solves an optimization problem that regularizes according to a certain discrete notion of total variation… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 75 pages, 10 figures

  48. arXiv:2212.05588  [pdf, other

    cs.HC

    Towards a Learner-Centered Explainable AI: Lessons from the learning sciences

    Authors: Anna Kawakami, Luke Guerdan, Yang Cheng, Anita Sun, Alison Hu, Kate Glazko, Nikos Arechiga, Matthew Lee, Scott Carter, Haiyi Zhu, Kenneth Holstein

    Abstract: In this short paper, we argue for a refocusing of XAI around human learning goals. Drawing upon approaches and theories from the learning sciences, we propose a framework for the learner-centered design and evaluation of XAI systems. We illustrate our framework through an ongoing case study in the context of AI-augmented social work.

    Submitted 11 December, 2022; originally announced December 2022.

    Comments: 7 pages, 2 figures

    Journal ref: Human-Centered Explainable AI Workshop at ACM CHI Conference on Human Factors in Computing Systems 2022

  49. arXiv:2211.07820  [pdf, other

    cs.CV

    Clinically Plausible Pathology-Anatomy Disentanglement in Patient Brain MRI with Structured Variational Priors

    Authors: Anjun Hu, Jean-Pierre R. Falet, Brennan S. Nichyporuk, Changjian Shui, Douglas L. Arnold, Sotirios A. Tsaftaris, Tal Arbel

    Abstract: We propose a hierarchically structured variational inference model for accurately disentangling observable evidence of disease (e.g. brain lesions or atrophy) from subject-specific anatomy in brain MRIs. With flexible, partially autoregressive priors, our model (1) addresses the subtle and fine-grained dependencies that typically exist between anatomical and pathological generating factors of an M… ▽ More

    Submitted 16 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 11 pages

  50. arXiv:2211.06313  [pdf

    cs.CY

    BioJam Camp: toward justice through bioengineering and biodesign co-learning with youth

    Authors: Callie Chappell, Henry A. -A., Elvia B. O., Emily B., Bailey B., Jacqueline C. -M., Caroline Daws, Cristian F., Emiliano G., Page Goddard, Xavier G., Anne Hu, Gabriela J., Kelley Langhans, Briana Martin-Villa, Penny M. -S., Jennifer M., Soyang N., Melissa Ortiz, Aryana P., Trisha S, Corinne Takara, Emily T., Paloma Vazquez, Rolando Perez , et al. (1 additional authors not shown)

    Abstract: BioJam is a political, artistic, and educational project in which Bay Area artists, scientists, and educators collaborate with youth and communities of color to address historical exclusion of their communities in STEM fields and reframe what science can be. As an intergenerational collective, we co-learn on topics of culture (social and biological), community (cultural and ecological), and creati… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.