Skip to main content

Showing 1–50 of 535 results for author: Zhao, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02394  [pdf, other

    cs.CV

    Similarity Distance-Based Label Assignment for Tiny Object Detection

    Authors: Shuohao Shi, Qiang Fang, Tong Zhao, Xin Xu

    Abstract: Tiny object detection is becoming one of the most challenging tasks in computer vision because of the limited object size and lack of information. The label assignment strategy is a key factor affecting the accuracy of object detection. Although there are some effective label assignment strategies for tiny objects, most of them focus on reducing the sensitivity to the bounding boxes to increase th… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, 6 tables

  2. arXiv:2407.01007  [pdf, other

    cs.CV

    GMT: A Robust Global Association Model for Multi-Target Multi-Camera Tracking

    Authors: Huijie Fan, Tinghui Zhao, Qiang Wang, Baojie Fan, Yandong Tang, LianQing Liu

    Abstract: In the task of multi-target multi-camera (MTMC) tracking of pedestrians, the data association problem is a key issue and main challenge, especially with complications arising from camera movements, lighting variations, and obstructions. However, most MTMC models adopt two-step approaches, thus heavily depending on the results of the first-step tracking in practical applications. Moreover, the same… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.00038  [pdf, other

    cs.IR

    JungleGPT: Designing and Optimizing Compound AI Systems for E-Commerce

    Authors: Sherry Ruan, Tian Zhao

    Abstract: LLMs have significantly advanced the e-commerce industry by powering applications such as personalized recommendations and customer service. However, most current efforts focus solely on monolithic LLMs and fall short in addressing the complexity and scale of real-world e-commerce scenarios. In this work, we present JungleGPT, the first compound AI system tailored for real-world e-commerce applica… ▽ More

    Submitted 28 May, 2024; originally announced July 2024.

  4. arXiv:2406.18763  [pdf, other

    cs.LG cs.AI

    Conformalized Link Prediction on Graph Neural Networks

    Authors: Tianyi Zhao, Jian Kang, Lu Cheng

    Abstract: Graph Neural Networks (GNNs) excel in diverse tasks, yet their applications in high-stakes domains are often hampered by unreliable predictions. Although numerous uncertainty quantification methods have been proposed to address this limitation, they often lack \textit{rigorous} uncertainty estimates. This work makes the first attempt to introduce a distribution-free and model-agnostic uncertainty… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  5. arXiv:2406.16620  [pdf, other

    cs.CV cs.CL

    OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

    Authors: Lu Zhang, Tiancheng Zhao, Heting Ying, Yibo Ma, Kyusong Lee

    Abstract: Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents significant challenges due to the vast data and processing demands. Traditional methods, like extracting key frames or converting frames to text, ofte… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  6. arXiv:2406.16321  [pdf, other

    cs.LG cs.AI

    Multimodal Graph Benchmark

    Authors: **g Zhu, Yuhang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

    Abstract: Associating unstructured data with structured information is crucial for real-world tasks that require relevance search. However, existing graph learning benchmarks often overlook the rich semantic information associate with each node. To bridge such gap, we introduce the Multimodal Graph Benchmark (MM-GRAPH), the first comprehensive multi-modal graph benchmark that incorporates both textual and v… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: https://mm-graph-benchmark.github.io/

  7. arXiv:2406.15568  [pdf, other

    cs.LG

    Robust Reinforcement Learning from Corrupted Human Feedback

    Authors: Alexander Bukharin, Ilgee Hong, Haoming Jiang, Qingru Zhang, Zixuan Zhang, Tuo Zhao

    Abstract: Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted p… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  8. arXiv:2406.13558   

    cs.AI

    Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach

    Authors: Xuehao Zhai, Hanlin Tian, Lintong Li, Tianyu Zhao

    Abstract: Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduc… ▽ More

    Submitted 22 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We currently do not have a replacement version available. We request withdrawal due to a significant methodological error affecting the paper's validity, specifically a miscalculation in data preprocessing. We are working on corrections, but this will take time. We believe an interim withdrawal is necessary to prevent the dissemination of incorrect information.

  9. arXiv:2406.12439  [pdf, other

    cs.LG

    A data-centric approach for assessing progress of Graph Neural Networks

    Authors: Tianqi Zhao, Ngan Thi Dong, Alan Hanjalic, Megha Khosla

    Abstract: Graph Neural Networks (GNNs) have achieved state-of-the-art results in node classification tasks. However, most improvements are in multi-class classification, with less focus on the cases where each node could have multiple labels. The first challenge in studying multi-label node classification is the scarcity of publicly available datasets. To address this, we collected and released three real-w… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Journal ref: Published in Data-centric Machine Learning Research Worshop @ ICML 2024

  10. arXiv:2406.11354  [pdf, other

    cs.CL cs.AI cs.CV

    Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

    Authors: Zilun Zhang, Yutao Sun, Tiancheng Zhao, Leigang Sha, Ruochen Xu, Kyusong Lee, Jianwei Yin

    Abstract: Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Language Models (MLLMs) which are composed of the LLM base and visual projector (e.g. LLaVA), a significant decline in performance on language benchmarks… ▽ More

    Submitted 19 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  11. arXiv:2406.11191  [pdf, other

    cs.CL

    A Survey on Human Preference Learning for Large Language Models

    Authors: Ruili Jiang, Kehai Chen, Xuefeng Bai, Zhixuan He, Juntao Li, Muyun Yang, Tiejun Zhao, Liqiang Nie, Min Zhang

    Abstract: The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a wide range of contexts. Despite the numerous related studies conducted, a perspective on how human preferences are introduced into LLMs remains limited, which ma… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE copyright statement added (also applied to the former version)

  12. arXiv:2406.10797  [pdf, other

    cs.CV

    STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

    Authors: Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi **

    Abstract: We present STAR, a text-to-image model that employs scale-wise auto-regressive paradigm. Unlike VAR, which is limited to class-conditioned synthesis within a fixed set of predetermined categories, our STAR enables text-driven open-set generation through three key designs: To boost diversity and generalizability with unseen combinations of objects and concepts, we introduce a pre-trained text encod… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures

  13. arXiv:2406.10777  [pdf, other

    cs.CL

    RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

    Authors: Haoyu Wang, Tianci Liu, Tuo Zhao, **g Gao

    Abstract: Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  14. arXiv:2406.10593  [pdf, other

    cs.AI cs.DB cs.IR

    QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

    Authors: Yinggang Sun, Ziming Guo, Haining Yu, Chuanyi Liu, Xiang Li, Bingxuan Wang, Xiangzhan Yu, Tiancheng Zhao

    Abstract: Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions. It is desired to enhance LLMs to handle multiple types of questions in multi-turn Text-to-SQL tasks. To address this, we propose a novel data augmen… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures

  15. Multi-source Unsupervised Domain Adaptation on Graphs with Transferability Modeling

    Authors: Tianxiang Zhao, Dongsheng Luo, Xiang Zhang, Suhang Wang

    Abstract: In this paper, we tackle a new problem of \textit{multi-source unsupervised domain adaptation (MSUDA) for graphs}, where models trained on annotated source domains need to be transferred to the unsupervised target graph for node classification. Due to the discrepancy in distribution across domains, the key challenge is how to select good source instances and how to adapt the model. Diverse graph s… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Journal ref: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24), August 25--29, 2024, Barcelona, Spain

  16. arXiv:2406.08552  [pdf, other

    cs.CV

    DiTFastAttn: Attention Compression for Diffusion Transformer Models

    Authors: Zhihang Yuan, Pu Lu, Hanling Zhang, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention's quadratic complexity. We propose DiTFastAttn, a novel post-training compression method to alleviate DiT's computational bottleneck. We identify three key redundancies in the attention computation during DiT inference: 1. spatial redundancy, where many attention heads focus on… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  17. arXiv:2406.07232  [pdf, other

    cs.CL cs.AI

    DUAL-REFLECT: Enhancing Large Language Models for Reflective Translation through Dual Learning Feedback Mechanisms

    Authors: Andong Chen, Lianzhang Lou, Kehai Chen, Xuefeng Bai, Yang Xiang, Muyun Yang, Tiejun Zhao, Min Zhang

    Abstract: Recently, large language models (LLMs) enhanced by self-reflection have achieved promising performance on machine translation. The key idea is guiding LLMs to generate translation with human-like feedback. However, existing self-reflection methods lack effective feedback information, limiting the translation performance. To address this, we introduce a DUAL-REFLECT framework, leveraging the dual l… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 main conference

  18. arXiv:2406.06600  [pdf, other

    cs.LG cs.AI cs.CL

    HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

    Authors: Yutao Sun, Mingshuai Chen, Kangjia Zhao, He Li, **tao Chen, Linyu Yang, Zhongyi Wang, Tiancheng Zhao, Jianwei Yin

    Abstract: Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show how HORAE facilitates an intelligent service regulation pipeline by further exploiting a fine-tuned large language model named HORAE that automates the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  19. arXiv:2406.05891  [pdf, other

    eess.IV cs.CV cs.LG

    GCtx-UNet: Efficient Network for Medical Image Segmentation

    Authors: Khaled Alrfou, Tian Zhao

    Abstract: Medical image segmentation is crucial for disease diagnosis and monitoring. Though effective, the current segmentation networks such as UNet struggle with capturing long-range features. More accurate models such as TransUNet, Swin-UNet, and CS-UNet have higher computation complexity. To address this problem, we propose GCtx-UNet, a lightweight segmentation architecture that can capture global and… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 13 pages, 7 figures, 7 tables

  20. arXiv:2406.03684  [pdf, other

    cs.CV cs.CR

    Principles of Designing Robust Remote Face Anti-Spoofing Systems

    Authors: Xiang Xu, Tianchen Zhao, Zheng Zhang, Zhihua Li, Jon Wu, Alessandro Achille, Mani Srivastava

    Abstract: Protecting digital identities of human face from various attack vectors is paramount, and face anti-spoofing plays a crucial role in this endeavor. Current approaches primarily focus on detecting spoofing attempts within individual frames to detect presentation attacks. However, the emergence of hyper-realistic generative models capable of real-time operation has heightened the risk of digitally g… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Under review

  21. arXiv:2406.02764  [pdf, other

    cs.LG cs.AI

    Adaptive Preference Scaling for Reinforcement Learning with Human Feedback

    Authors: Ilgee Hong, Zichong Li, Alexander Bukharin, Yixiao Li, Haoming Jiang, Tianbao Yang, Tuo Zhao

    Abstract: Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs of trajectory segments, which fails to capture the varying strengths of preferences across different pairs. In this paper, we propose a novel adaptiv… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.02540  [pdf, other

    cs.CV

    ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

    Authors: Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

    Abstract: Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions. However, larger model sizes and multi-frame processing for video generation lead to increased computational and memory costs, posing challenges for practical deployment on edge devices. Post-Training Quantization (PTQ) is an ef… ▽ More

    Submitted 30 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Project Page: https://a-suozhang.xyz/viditq.github.io/

  23. arXiv:2406.01229  [pdf, other

    cs.LG

    AGALE: A Graph-Aware Continual Learning Evaluation Framework

    Authors: Tianqi Zhao, Alan Hanjalic, Megha Khosla

    Abstract: In recent years, continual learning (CL) techniques have made significant progress in learning from streaming data while preserving knowledge across sequential tasks, particularly in the realm of euclidean data. To foster fair evaluation and recognize challenges in CL settings, several evaluation frameworks have been proposed, focusing mainly on the single- and multi-label classification task on e… ▽ More

    Submitted 7 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  24. arXiv:2405.20624  [pdf, ps, other

    cs.CL cs.AI

    Leveraging Large Language Models for Entity Matching

    Authors: Qianyu Huang, Tongfang Zhao

    Abstract: Entity matching (EM) is a critical task in data integration, aiming to identify records across different datasets that refer to the same real-world entities. Traditional methods often rely on manually engineered features and rule-based systems, which struggle with diverse and unstructured data. The emergence of Large Language Models (LLMs) such as GPT-4 offers transformative potential for EM, leve… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  25. arXiv:2405.19109  [pdf, other

    cs.CL

    PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering

    Authors: Fangzhi Xu, Qika Lin, Tianzhe Zhao, Jiawei Han, Jun Liu

    Abstract: Logical reasoning task has attracted great interest since it was proposed. Faced with such a task, current competitive models, even large language models (e.g., ChatGPT and PaLM 2), still perform badly. Previous promising LMs struggle in logical consistency modeling and logical structure perception. To this end, we model the logical reasoning task by transforming each logical sample into reasoning… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024

  26. arXiv:2405.17873  [pdf, other

    cs.CV cs.AI

    MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

    Authors: Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

    Abstract: Diffusion models have achieved significant visual generation quality. However, their significant computational and memory costs pose challenge for their application on resource-constrained mobile devices or even desktop GPUs. Recent few-step diffusion models reduces the inference time by reducing the denoising steps. However, their memory consumptions are still excessive. The Post Training Quantiz… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project Page: https://a-suozhang.xyz/mixdq.github.io/

  27. arXiv:2405.14506  [pdf, other

    cs.CV cs.AI

    SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification

    Authors: Zuoyong Li, Qinghua Lin, Haoyi Fan, Tiesong Zhao, David Zhang

    Abstract: Semi-supervised learning suffers from the imbalance of labeled and unlabeled training data in the video surveillance scenario. In this paper, we propose a new semi-supervised learning method called SIAVC for industrial accident video classification. Specifically, we design a video augmentation module called the Super Augmentation Block (SAB). SAB adds Gaussian noise and randomly masks video frames… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.12971  [pdf, other

    cs.CV

    BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once

    Authors: Theodore Zhao, Yu Gu, Jianwei Yang, Naoto Usuyama, Ho Hin Lee, Tristan Naumann, Jianfeng Gao, Angela Crabtree, Jacob Abel, Christine Moung-Wen, Brian Piening, Carlo Bifulco, Mu Wei, Hoifung Poon, Sheng Wang

    Abstract: Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. Holistic image analysis comprises interdependent subtasks such as segmentation, detection, and recognition of relevant objects. Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, an… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Project page: https://aka.ms/biomedparse-project

  29. arXiv:2405.10970  [pdf, other

    cs.LG cs.AI cs.CR

    Untargeted Adversarial Attack on Knowledge Graph Embeddings

    Authors: Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Qika Lin, Yuxia Geng, Jun Liu

    Abstract: Knowledge graph embedding (KGE) methods have achieved great success in handling various knowledge graph (KG) downstream tasks. However, KGE methods may learn biased representations on low-quality KGs that are prevalent in the real world. Some recent studies propose adversarial attacks to investigate the vulnerabilities of KGE methods, but their attackers are target-oriented with the KGE method and… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by SIGIR 2024

  30. arXiv:2405.08578  [pdf

    cs.CV

    Local-peak scale-invariant feature transform for fast and random image stitching

    Authors: Hao Li, Lipo Wang, Tianyun Zhao, Wei Zhao

    Abstract: Image stitching aims to construct a wide field of view with high spatial resolution, which cannot be achieved in a single exposure. Typically, conventional image stitching techniques, other than deep learning, require complex computation and thus computational pricy, especially for stitching large raw images. In this study, inspired by the multiscale feature of fluid turbulence, we developed a fas… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.07518  [pdf, other

    cs.AR cs.AI

    SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts

    Authors: Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, Xiangyu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain , et al. (5 additional authors not shown)

    Abstract: Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate increase in compute-to-memory ratio of modern AI accelerators have created a memory wall, necessitating new methods to deploy AI. Composition of Expert… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  32. arXiv:2405.02292  [pdf, other

    cs.RO cs.LG

    ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation

    Authors: ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka , et al. (1 additional authors not shown)

    Abstract: Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation. We introduce ALOHA 2, an enhanced version of ALOHA that has greater performance, ergonomics, and robustness compared to the original design. To accelerate research in large-scale bim… ▽ More

    Submitted 7 February, 2024; originally announced May 2024.

    Comments: Project website: aloha-2.github.io

  33. arXiv:2405.00332  [pdf, other

    cs.CL cs.AI cs.LG

    A Careful Examination of Large Language Model Performance on Grade School Arithmetic

    Authors: Hugh Zhang, Jeff Da, Dean Lee, Vaughn Robinson, Catherine Wu, Will Song, Tiffany Zhao, Pranav Raja, Dylan Slack, Qin Lyu, Sean Hendryx, Russell Kaplan, Michele Lunati, Summer Yue

    Abstract: Large language models (LLMs) have achieved impressive success on many benchmarks for mathematical reasoning. However, there is growing concern that some of this performance actually reflects dataset contamination, where data closely resembling benchmark questions leaks into the training data, instead of true reasoning ability. To investigate this claim rigorously, we commission Grade School Math 1… ▽ More

    Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  34. arXiv:2404.17360  [pdf, other

    cs.CV

    UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning

    Authors: Maoxun Yuan, Bo Cui, Tianyi Zhao, Xingxing Wei

    Abstract: Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  35. arXiv:2404.15466  [pdf, other

    stat.ML cs.LG

    Private Optimal Inventory Policy Learning for Feature-based Newsvendor with Unknown Demand

    Authors: Tuoyi Zhao, Wen-xin Zhou, Lan Wang

    Abstract: The data-driven newsvendor problem with features has recently emerged as a significant area of research, driven by the proliferation of data across various sectors such as retail, supply chains, e-commerce, and healthcare. Given the sensitive nature of customer or organizational data often used in feature-based analysis, it is crucial to ensure individual privacy to uphold trust and confidence. De… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  36. arXiv:2404.14801  [pdf, other

    cs.CV

    DesignProbe: A Graphic Design Benchmark for Multimodal Large Language Models

    Authors: Jieru Lin, Danqing Huang, Tiejun Zhao, Dechen Zhan, Chin-Yew Lin

    Abstract: A well-executed graphic design typically achieves harmony in two levels, from the fine-grained design elements (color, font and layout) to the overall design. This complexity makes the comprehension of graphic design challenging, for it needs the capability to both recognize the design elements and understand the design. With the rapid development of Multimodal Large Language Models (MLLMs), we es… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: work in progress

  37. arXiv:2404.14070  [pdf

    cs.HC cs.CY

    No General Code of Ethics for All: Ethical Considerations in Human-bot Psycho-counseling

    Authors: Lizhi Ma, Tong Zhao, Huachuan Qiu, Zhenzhong Lan

    Abstract: The pervasive use of AI applications is increasingly influencing our everyday decisions. However, the ethical challenges associated with AI transcend conventional ethics and single-discipline approaches. In this paper, we propose aspirational ethical principles specifically tailored for human-bot psycho-counseling during an era when AI-powered mental health services are continually emerging. We ex… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 54 pages,11 tables, APA style, the tables are presented following Reference

  38. arXiv:2404.12107  [pdf, other

    cs.DB cs.DS cs.SI

    Effective Individual Fairest Community Search over Heterogeneous Information Networks

    Authors: Taige Zhao, Jianxin Li, Ningning Cui, Wei Luo

    Abstract: Community search over heterogeneous information networks has been applied to wide domains, such as activity organization and team formation. From these scenarios, the members of a group with the same treatment often have different levels of activity and workloads, which causes unfairness in the treatment between active members and inactive members (called individual unfairness). However, existing… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  39. arXiv:2404.08660  [pdf, other

    cs.IR cs.LG

    How Does Message Passing Improve Collaborative Filtering?

    Authors: Mingxuan Ju, William Shiao, Zhichun Guo, Yanfang Ye, Yozen Liu, Neil Shah, Tong Zhao

    Abstract: Collaborative filtering (CF) has exhibited prominent results for recommender systems and been broadly utilized for real-world applications. A branch of research enhances CF methods by message passing used in graph neural networks, due to its strong capabilities of extracting knowledge from graph-structured data, like user-item bipartite graphs that naturally exist in CF. They assume that message p… ▽ More

    Submitted 27 March, 2024; originally announced April 2024.

  40. arXiv:2404.06973  [pdf

    cs.SI cs.CY

    Embedding Economic Incentives in Social Networks Shape the Diffusion of Digital Technological Innovation

    Authors: Zhe Li, Tianfang Zhao, Hongjun Zhu

    Abstract: The digital innovation accompanied by explicit economic incentives have fundamentally changed the process of innovation diffusion. As a representative of digital innovation, NFTs provide a decentralized and secure way to authenticate and trade digital assets, offering the potential for new revenue streams in the digital space. However, current researches about NFTs mainly focus on their transactio… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  41. arXiv:2404.06605  [pdf, other

    cs.CV

    RoadBEV: Road Surface Reconstruction in Bird's Eye View

    Authors: Tong Zhao, Lei Yang, Yichen Xie, Mingyu Ding, Masayoshi Tomizuka, Yintao Wei

    Abstract: Road surface conditions, especially geometry profiles, enormously affect driving performance of autonomous vehicles. Vision-based online road reconstruction promisingly captures road information in advance. Existing solutions like monocular depth estimation and stereo matching suffer from modest performance. The recent technique of Bird's-Eye-View (BEV) perception provides immense potential to mor… ▽ More

    Submitted 20 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Dataset page: https://thu-rsxd.com/rsrd Code: https://github.com/ztsrxh/RoadBEV

  42. arXiv:2404.04575  [pdf, other

    cs.LG cs.AI math.OC

    To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

    Authors: Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang, Tianbao Yang

    Abstract: The temperature parameter plays a profound role during training and/or inference with large foundation models (LFMs) such as large language models (LLMs) and CLIP models. Particularly, it adjusts the logits in the softmax function in LLMs, which is crucial for next token generation, and it scales the similarities in the contrastive loss for training CLIP models. A significant question remains: Is… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

    Comments: 41 pages, 10 figures, accepted by ICML2024

  43. arXiv:2404.02511  [pdf, other

    math.OC cs.LG

    Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

    Authors: Hoang Huy Nguyen, Yan Li, Tuo Zhao

    Abstract: In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  44. arXiv:2404.01925  [pdf, other

    cs.CV cs.AI

    Improving Bird's Eye View Semantic Segmentation by Task Decomposition

    Authors: Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

    Abstract: Semantic segmentation in bird's eye view (BEV) plays a crucial role in autonomous driving. Previous methods usually follow an end-to-end pipeline, directly predicting the BEV segmentation map from monocular RGB inputs. However, the challenge arises when the RGB inputs and BEV targets from distinct perspectives, making the direct point-to-point predicting hard to optimize. In this paper, we decompo… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2404.01657  [pdf, other

    cs.CL cs.AI cs.CV cs.LG eess.AS

    Release of Pre-Trained Models for the Japanese Language

    Authors: Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda

    Abstract: AI democratization aims to create a world in which the average person can utilize AI techniques. To achieve this goal, numerous research institutes have attempted to make their results accessible to the public. In particular, large pre-trained models trained on large-scale data have shown unprecedented potential, and their release has had a significant impact. However, most of the released models… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 9 pages, 1 figure, 5 tables, accepted for LREC-COLING 2024. Models are publicly available at https://huggingface.co/rinna

  46. arXiv:2403.18295  [pdf, other

    cs.CL

    Dual Instruction Tuning with Large Language Models for Mathematical Reasoning

    Authors: Yongwei Zhou, Tiejun Zhao

    Abstract: Recent advancements highlight the success of instruction tuning with large language models (LLMs) utilizing Chain-of-Thought (CoT) data for mathematical reasoning tasks. Despite the fine-tuned LLMs, challenges persist, such as incorrect, missing, and redundant steps in CoT generation leading to inaccuracies in answer predictions. To alleviate this problem, we propose a dual instruction tuning stra… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.18280  [pdf, other

    cs.IR

    Improving Out-of-Vocabulary Handling in Recommendation Systems

    Authors: William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen Liu

    Abstract: Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommendin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures

  48. arXiv:2403.16379  [pdf, other

    cs.CV

    FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

    Authors: Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang

    Abstract: In recent years, there has been significant progress in the development of text-to-image generative models. Evaluating the quality of the generative models is one essential step in the development process. Unfortunately, the evaluation process could consume a significant amount of computational resources, making the required periodic evaluation of model performance (e.g., monitoring training progr… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: The paper is accepted by CVPR 2024

  49. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  50. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/