Skip to main content

Showing 1–50 of 399 results for author: Wang, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20334  [pdf, other

    cs.CV cs.GR

    VividDream: Generating 3D Scene with Ambient Dynamics

    Authors: Yao-Chih Lee, Yi-Ting Chen, Andrew Wang, Ting-Hsuan Liao, Brandon Y. Feng, Jia-Bin Huang

    Abstract: We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://vivid-dream-4d.github.io

  2. arXiv:2405.17537  [pdf, other

    cs.AI cs.CL cs.CV

    BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

    Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

    Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 16 pages with 9 figures

  3. arXiv:2405.17248  [pdf, other

    stat.ML cs.LG

    Transformer In-Context Learning for Categorical Data

    Authors: Aaron T. Wang, Ricardo Henao, Lawrence Carin

    Abstract: Recent research has sought to understand Transformers through the lens of in-context learning with functional data. We extend that line of work with the goal of moving closer to language models, considering categorical outcomes, nonlinear underlying models, and nonlinear attention. The contextual data are of the form $\textsf{C}=(x_1,c_1,\dots,x_N,c_{N})$ where each $c_i\in\{0,\dots,C-1\}$ is draw… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.14458  [pdf, other

    cs.CV

    YOLOv10: Real-Time End-to-End Object Detection

    Authors: Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zijia Lin, Jungong Han, Guiguang Ding

    Abstract: Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-time object detection owing to their effective balance between computational cost and detection performance. Researchers have explored the architectural designs, optimization objectives, data augmentation strategies, and others for YOLOs, achieving notable progress. However, the reliance on the non-maximum sup… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/THU-MIG/yolov10

  6. Towards Feature Engineering with Human and AI's Knowledge: Understanding Data Science Practitioners' Perceptions in Human&AI-Assisted Feature Engineering Design

    Authors: Qian Zhu, Dakuo Wang, Shuai Ma, April Yi Wang, Zixin Chen, Udayan Khurana, Xiaojuan Ma

    Abstract: As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited und… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Computational Notebooks, Human-AI Collaboration, Feature Recommendation

  7. arXiv:2405.14019  [pdf, other

    cs.CV

    BrainMorph: A Foundational Keypoint Model for Robust and Flexible Brain MRI Registration

    Authors: Alan Q. Wang, Rachit Saluja, Heejong Kim, Xinzi He, Adrian Dalca, Mert R. Sabuncu

    Abstract: We present a keypoint-based foundation model for general purpose brain MRI registration, based on the recently-proposed KeyMorph framework. Our model, called BrainMorph, serves as a tool that supports multi-modal, pairwise, and scalable groupwise registration. BrainMorph is trained on a massive dataset of over 100,000 3D volumes, skull-stripped and non-skull-stripped, from nearly 16,000 unique hea… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.09941

  8. arXiv:2405.09713  [pdf, other

    cs.CV cs.AI cs.CL

    SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

    Authors: Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

    Abstract: Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically withi… ▽ More

    Submitted 16 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: CVPR

  9. arXiv:2405.08672  [pdf, other

    eess.IV cs.CV

    EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

    Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: early accepted by MICCAI 2024

  10. arXiv:2405.07518  [pdf, other

    cs.AR cs.AI

    SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  11. arXiv:2405.07060  [pdf, other

    cs.RO

    Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People

    Authors: Masaki Kuribayashi, Kohei Uehara, Allan Wang, Daisuke Sato, Simon Chu, Shigeo Morishima

    Abstract: Visual Language Navigation (VLN) powered navigation robots have the potential to guide blind people by understanding and executing route instructions provided by sighted passersby. This capability allows robots to operate in environments that are often unknown a priori. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  12. arXiv:2405.06201  [pdf, other

    cs.CV

    PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

    Authors: Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

    Abstract: Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  13. arXiv:2405.04605  [pdf

    cs.CV cs.AI cs.LG

    AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

    Authors: Fakrul Islam Tushar, Avivah Wang, Lavsen Dahal, Michael R. Harowicz, Kyle J. Lafata, Tina D. Tailor, Joseph Y. Lo

    Abstract: BACKGROUND: Lung cancer's high mortality rate can be mitigated by early detection, which is increasingly reliant on artificial intelligence (AI) for diagnostic imaging. However, the performance of AI models is contingent upon the datasets used for their training and validation. METHODS: This study developed and validated the DLCSD-mD and LUNA16-mD models utilizing the Duke Lung Cancer Screening Da… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 16 pages, 2 tables, 5 figures

  14. arXiv:2405.03855  [pdf, other

    cs.CY

    Strategies for Increasing Corporate Responsible AI Prioritization

    Authors: Angelina Wang, Teresa Datta, John P. Dickerson

    Abstract: Responsible artificial intelligence (RAI) is increasingly recognized as a critical concern. However, the level of corporate RAI prioritization has not kept pace. In this work, we conduct 16 semi-structured interviews with practitioners to investigate what has historically motivated companies to increase the prioritization of RAI. What emerges is a complex story of conflicting and varied factors, b… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2405.03113  [pdf, other

    cs.RO cs.AI

    Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

    Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

    Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  16. arXiv:2405.01691  [pdf, other

    cs.CV cs.LG cs.RO

    Language-Enhanced Latent Representations for Out-of-Distribution Detection in Autonomous Driving

    Authors: Zhenjiang Mao, Dong-You Jhong, Ao Wang, Ivan Ruchkin

    Abstract: Out-of-distribution (OOD) detection is essential in autonomous driving, to determine when learning-based components encounter unexpected inputs. Traditional detectors typically use encoder models with fixed settings, thus lacking effective human interaction capabilities. With the rise of large foundation models, multimodal inputs offer the possibility of taking human language as a latent represent… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Presented at the Robot Trust for Symbiotic Societies (RTSS) Workshop, co-located with ICRA 2024

  17. arXiv:2404.16506  [pdf, other

    cs.CL

    Building a Japanese Document-Level Relation Extraction Dataset Assisted by Cross-Lingual Transfer

    Authors: Youmi Ma, An Wang, Naoaki Okazaki

    Abstract: Document-level Relation Extraction (DocRE) is the task of extracting all semantic relationships from a document. While studies have been conducted on English DocRE, limited attention has been given to DocRE in non-English languages. This work delves into effectively utilizing existing English resources to promote DocRE studies in non-English languages, with Japanese as the representative case. As… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted LREC-COLING 2024

  18. arXiv:2404.05626  [pdf, other

    cs.CV

    Learning a Category-level Object Pose Estimator without Pose Annotations

    Authors: Fengrui Tian, Yaoyao Liu, Adam Kortylewski, Yueqi Duan, Shaoyi Du, Alan Yuille, Angtian Wang

    Abstract: 3D object pose estimation is a challenging task. Previous works always require thousands of object images with annotated poses for learning the 3D pose correspondence, which is laborious and time-consuming for labeling. In this paper, we propose to learn a category-level 3D object pose estimator without pose annotations. Instead of using manually annotated images, we leverage diffusion models (e.g… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.05188  [pdf, other

    cs.CR cs.AI cs.CL

    Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging

    Authors: Tianshuo Cong, Delong Ran, Zesen Liu, Xinlei He, **yuan Liu, Yichen Gong, Qi Li, Anyu Wang, Xiaoyun Wang

    Abstract: Model merging is a promising lightweight model empowerment technique that does not rely on expensive computing devices (e.g., GPUs) or require the collection of specific training data. Instead, it involves editing different upstream model parameters to absorb their downstream task capabilities. However, uncertified model merging can infringe upon the Intellectual Property (IP) rights of the origin… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Technical Report

  20. arXiv:2404.05163  [pdf, other

    cs.CV

    Semantic Flow: Learning Semantic Field of Dynamic Scenes from Monocular Videos

    Authors: Fengrui Tian, Yueqi Duan, Angtian Wang, Jianfei Guo, Shaoyi Du

    Abstract: In this work, we pioneer Semantic Flow, a neural semantic representation of dynamic scenes from monocular videos. In contrast to previous NeRF methods that reconstruct dynamic scenes from the colors and volume densities of individual points, Semantic Flow learns semantics from continuous flows that contain rich 3D motion information. As there is 2D-to-3D ambiguity problem in the viewing direction… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICLR 2024, Codes are available at https://github.com/tianfr/Semantic-Flow/

  21. "Don't Step on My Toes": Resolving Editing Conflicts in Real-Time Collaboration in Computational Notebooks

    Authors: April Yi Wang, Zihan Wu, Christopher Brooks, Steve Oney

    Abstract: Real-time collaborative editing in computational notebooks can improve the efficiency of teamwork for data scientists. However, working together through synchronous editing of notebooks introduces new challenges. Data scientists may inadvertently interfere with each others' work by altering the shared codebase and runtime state if they do not set up a social protocol for working together and monit… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  22. Revisiting Categorical Color Perception in Scatterplots: Sequential, Diverging, and Categorical Palettes

    Authors: Chin Tseng, Arran Zeyu Wang, Ghulam Jilani Quadri, Danielle Albers Szafir

    Abstract: Existing guidelines for categorical color selection are heuristic, often grounded in intuition rather than empirical studies of readers' abilities. While design conventions recommend palettes maximize hue differences, more recent exploratory findings indicate other factors, such as lightness, may play a role in effective categorical palette design. We conducted a crowdsourced experiment on mean va… ▽ More

    Submitted 16 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in EuroVis 2024 Short Paper

    Journal ref: In Proceedings of the 26th EG/VGTC Conference on Visualization (EuroVis 2024), May 27-31, 2024, Odense, Denmark

  23. arXiv:2404.02167  [pdf, ps, other

    cs.IT

    A remark on conditional entropy

    Authors: Adam Wang

    Abstract: The following note proves that conditional entropy of a sequence is almost time-reversal invariant, specifically they only differ by a small constant factor dependent only upon the forward and backward models that the entropies are being calculated with respect to. This gives rise to a numerical value that quantifies learnability, as well as a methodology to control for distributional shift betwee… ▽ More

    Submitted 26 March, 2024; originally announced April 2024.

    Comments: Preprint, 5 pages

  24. arXiv:2404.01462  [pdf, other

    cs.LG cs.CL cs.IR

    OpenChemIE: An Information Extraction Toolkit For Chemistry Literature

    Authors: Vincent Fan, Yujie Qian, Alex Wang, Amber Wang, Connor W. Coley, Regina Barzilay

    Abstract: Information extraction from chemistry literature is vital for constructing up-to-date reaction databases for data-driven chemistry. Complete extraction requires combining information across text, tables, and figures, whereas prior work has mainly investigated extracting reactions from single modalities. In this paper, we present OpenChemIE to address this complex challenge and enable the extractio… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: To be submitted to the Journal of Chemical Information and Modeling

  25. arXiv:2403.18540  [pdf, other

    stat.ML cs.LG stat.CO

    skscope: Fast Sparsity-Constrained Optimization in Python

    Authors: Zezhi Wang, ** Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin Wang

    Abstract: Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 4 pages

  26. arXiv:2403.12985  [pdf, other

    cs.IT eess.SP

    Multi-objective Optimization for Data Collection in UAV-assisted Agricultural IoT

    Authors: Lingling Liu, Aimin Wang, Geng Sun, Jiahui Li, Hongyang Pan, Tony Q. S. Quek

    Abstract: The ground fixed base stations (BSs) are often deployed inflexibly, and have high overheads, as well as are susceptible to the damage from natural disasters, making it impractical for them to continuously collect data from sensor devices. To improve the network coverage and performance of wireless communication, unmanned aerial vehicles (UAVs) have been introduced in diverse wireless networks, the… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 13 pages, 7 figures, 4 tables

  27. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  28. arXiv:2403.09849  [pdf, other

    cs.CL cs.AI

    Self-Consistency Boosts Calibration for Math Reasoning

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng **, Haitao Mi, **song Su, Dong Yu

    Abstract: Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and acc… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  29. arXiv:2403.09327  [pdf, other

    cs.CV eess.IV

    Perspective-Equivariant Imaging: an Unsupervised Framework for Multispectral Pansharpening

    Authors: Andrew Wang, Mike Davies

    Abstract: Ill-posed image reconstruction problems appear in many scenarios such as remote sensing, where obtaining high quality images is crucial for environmental monitoring, disaster management and urban planning. Deep learning has seen great success in overcoming the limitations of traditional methods. However, these inverse problems rarely come with ground truth data, highlighting the importance of unsu… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Pre-print

  30. arXiv:2403.08268  [pdf, other

    cs.CV

    Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

    Authors: Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to pr… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project Page: https://follow-your-click.github.io/ Github Page: https://github.com/mayuelala/FollowYourClick

  31. arXiv:2403.07478  [pdf, other

    cs.IR cs.LG

    Towards Graph Foundation Models for Personalization

    Authors: Andreas Damianou, Francesco Fabbri, Paul Gigioli, Marco De Nadai, Alice Wang, Enrico Palumbo, Mounia Lalmas

    Abstract: In the realm of personalization, integrating diverse information sources such as consumption signals and content-based representations is becoming increasingly critical to build state-of-the-art solutions. In this regard, two of the biggest trends in research around this subject are Graph Neural Networks (GNNs) and Foundation Models (FMs). While GNNs emerged as a popular solution in industry for p… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  32. arXiv:2403.06459  [pdf, other

    eess.IV cs.CV

    From Pixel to Cancer: Cellular Automata in Computed Tomography

    Authors: Yuxiang Lai, Xiaoxi Chen, Angtian Wang, Alan Yuille, Zongwei Zhou

    Abstract: AI for cancer detection encounters the bottleneck of data scarcity, annotation difficulty, and low prevalence of early tumors. Tumor synthesis seeks to create artificial tumors in medical images, which can greatly diversify the data and annotations for AI training. However, current tumor synthesis approaches are not applicable across different organs due to their need for specific expertise and de… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  33. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy , et al. (683 additional authors not shown)

    Abstract: In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalit… ▽ More

    Submitted 25 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  34. arXiv:2403.05185  [pdf, other

    cs.IR cs.LG

    Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks

    Authors: Marco De Nadai, Francesco Fabbri, Paul Gigioli, Alice Wang, Ang Li, Fabrizio Silvestri, Laura Kim, Shawn Lin, Vladan Radosavljevic, Sandeep Ghael, David Nyhan, Hugues Bouchard, Mounia Lalmas-Roelleke, Andreas Damianou

    Abstract: In the ever-evolving digital audio landscape, Spotify, well-known for its music and talk content, has recently introduced audiobooks to its vast user base. While promising, this move presents significant challenges for personalized recommendations. Unlike music and podcasts, audiobooks, initially available for a fee, cannot be easily skimmed before purchase, posing higher stakes for the relevance… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: To appear in The Web Conference 2024 proceedings

  35. arXiv:2403.04116  [pdf, other

    eess.IV cs.CV

    Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis

    Authors: Yuanhao Cai, Yixun Liang, Jiahao Wang, Angtian Wang, Yulun Zhang, Xiaokang Yang, Zongwei Zhou, Alan Yuille

    Abstract: X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: The first 3D Gaussian Splatting-based method for X-ray 3D reconstruction

  36. arXiv:2403.03681  [pdf, other

    cs.RO cs.CV

    3D Object Visibility Prediction in Autonomous Driving

    Authors: Chuanyu Luo, Nuo Cheng, Ren Zhong, Haipeng Jiang, Wenyu Chen, Aoli Wang, Pu Li

    Abstract: With the rapid advancement of hardware and software technologies, research in autonomous driving has seen significant growth. The prevailing framework for multi-sensor autonomous driving encompasses sensor installation, perception, path planning, decision-making, and motion control. At the perception phase, a common approach involves utilizing neural networks to infer 3D bounding box (Bbox) attrib… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  37. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in develo** biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are develo** evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  38. Collaborate to Adapt: Source-Free Graph Domain Adaptation via Bi-directional Adaptation

    Authors: Zhen Zhang, Meihan Liu, Anhui Wang, Hongyang Chen, Zhao Li, Jiajun Bu, Bingsheng He

    Abstract: Unsupervised Graph Domain Adaptation (UGDA) has emerged as a practical solution to transfer knowledge from a label-rich source graph to a completely unlabelled target graph. However, most methods require a labelled source graph to provide supervision signals, which might not be accessible in the real-world settings due to regulations and privacy concerns. In this paper, we explore the scenario of… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW-2024

  39. arXiv:2403.01244  [pdf, other

    cs.CL cs.AI

    Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

    Authors: Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, **song Su

    Abstract: Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To addr… ▽ More

    Submitted 25 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: ACL 2024 main, long paper

  40. arXiv:2402.17370  [pdf, other

    cs.CV

    An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary

    Authors: Guodong Sun, Yuting Peng, Le Cheng, Mengya Xu, An Wang, Bo Wu, Hongliang Ren, Yang Zhang

    Abstract: The precise segmentation of ore images is critical to the successful execution of the beneficiation process. Due to the homogeneous appearance of the ores, which leads to low contrast and unclear boundaries, accurate segmentation becomes challenging, and recognition becomes problematic. This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 10 pages, 8 figures

  41. arXiv:2402.16641  [pdf, other

    cs.CV

    Towards Open-ended Visual Quality Comparison

    Authors: Haoning Wu, Hanwei Zhu, Zicheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin

    Abstract: Comparative settings (e.g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses. In this work, we extend the edge of emerging large multi-modality models (LMMs) to further advance visual quality comparison into… ▽ More

    Submitted 4 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Fix typos

  42. arXiv:2402.15631  [pdf, other

    cs.CL cs.AI

    Fine-Grained Self-Endorsement Improves Factuality and Reasoning

    Authors: Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng **, Haitao Mi, **song Su, Dong Yu

    Abstract: This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our appro… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  43. Do You See What I See? A Qualitative Study Eliciting High-Level Visualization Comprehension

    Authors: Ghulam Jilani Quadri, Arran Zeyu Wang, Zhehao Wang, Jennifer Adorno, Paul Rosen, Danielle Albers Szafir

    Abstract: Designers often create visualizations to achieve specific high-level analytical or communication goals. These goals require people to naturally extract complex, contextualized, and interconnected patterns in data. While limited prior work has studied general high-level interpretation, prevailing perceptual studies of visualization effectiveness primarily focus on isolated, predefined, low-level ta… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted for publication in ACM CHI 2024

    Journal ref: In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11-16, 2024, Honolulu, HI, USA

  44. arXiv:2402.14367  [pdf, other

    cs.LG cs.SI

    Representation Learning for Frequent Subgraph Mining

    Authors: Rex Ying, Tianyu Fu, Andrew Wang, Jiaxuan You, Yu Wang, Jure Leskovec

    Abstract: Identifying frequent subgraphs, also called network motifs, is crucial in analyzing and predicting properties of real-world networks. However, finding large commonly-occurring motifs remains a challenging problem not only due to its NP-hard subroutine of subgraph counting, but also the exponential growth of the number of possible subgraphs patterns. Here we present Subgraph Pattern Miner (SPMiner)… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Oral Presentation in The Graph Representation Learning and Beyond (GRL+) Workshop from The 37th International Conference on Ma- chine Learning, 2020

  45. arXiv:2402.13929  [pdf, other

    cs.CV cs.AI cs.LG

    SDXL-Lightning: Progressive Adversarial Diffusion Distillation

    Authors: Shanchuan Lin, Anran Wang, Xiao Yang

    Abstract: We propose a diffusion distillation method that achieves new state-of-the-art in one-step/few-step 1024px text-to-image generation based on SDXL. Our method combines progressive and adversarial distillation to achieve a balance between quality and mode coverage. In this paper, we discuss the theoretical analysis, discriminator design, model formulation, and training techniques. We open-source our… ▽ More

    Submitted 2 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  46. arXiv:2402.12370  [pdf, other

    cs.CL cs.AI

    AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies

    Authors: Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews, Daniel Khashabi

    Abstract: Humans regularly engage in analogical thinking, relating personal experiences to current situations ($X$ is analogous to $Y$ because of $Z$). Analogical thinking allows humans to solve problems in creative ways, grasp difficult concepts, and articulate ideas more effectively. Can language models (LMs) do the same? To answer this question, we propose ANALOBENCH, a benchmark to determine analogical… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  47. arXiv:2402.11588  [pdf, other

    cs.CV cs.AI

    SDiT: Spiking Diffusion Model with Transformer

    Authors: Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-** Li

    Abstract: Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion mode… ▽ More

    Submitted 24 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  48. arXiv:2402.09750  [pdf, other

    cs.HC cs.AI

    Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

    Authors: Anqi Wang, Zhizhuo Yin, Yulu Hu, Yuanyuan Mao, Pan Hui

    Abstract: Recently, the potential of large language models (LLMs) has been widely used in assisting programming. However, current research does not explore the artist potential of LLMs in creative coding within artist and AI collaboration. Our work probes the reflection type of artists in the creation process with such collaboration. We compare two common collaboration approaches: invoking the entire progra… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 15 pages, 4 figures

    ACM Class: J.5

  49. arXiv:2402.09387  [pdf, other

    physics.plasm-ph cs.LG

    Active Disruption Avoidance and Trajectory Design for Tokamak Ramp-downs with Neural Differential Equations and Reinforcement Learning

    Authors: Allen M. Wang, Oswin So, Charles Dawson, Darren T. Garnier, Cristina Rea, Chuchu Fan

    Abstract: The tokamak offers a promising path to fusion energy, but plasma disruptions pose a major economic risk, motivating considerable advances in disruption avoidance. This work develops a reinforcement learning approach to this problem by training a policy to safely ramp-down the plasma current while avoiding limits on a number of quantities correlated with disruptions. The policy training environment… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  50. arXiv:2402.08159  [pdf, other

    eess.IV cs.CV

    Poisson flow consistency models for low-dose CT image denoising

    Authors: Dennis Hein, Adam Wang, Ge Wang

    Abstract: Diffusion and Poisson flow models have demonstrated remarkable success for a wide range of generative tasks. Nevertheless, their iterative nature results in computationally expensive sampling and the number of function evaluations (NFE) required can be orders of magnitude larger than for single-step methods. Consistency models are a recent class of deep generative models which enable single-step s… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.