Skip to main content

Showing 1–7 of 7 results for author: Di, Q

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.10776  [pdf, other

    cs.LG

    Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

    Authors: Qiwei Di, Jiafan He, Quanquan Gu

    Abstract: Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate the output in an undesirable or harmful direction. To tackle this challenge, we study a specific model within this problem domain--con… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 24pages, 5 figures

  2. arXiv:2402.08998  [pdf, other

    cs.LG stat.ML

    Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

    Authors: Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

    Abstract: We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we p… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 28 pages, 1 figure, In ICML 2023

  3. arXiv:2310.01380  [pdf, other

    cs.LG math.OC stat.ML

    Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

    Authors: Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

    Abstract: Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function app… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 43 pages, 1 table

  4. arXiv:2310.00968  [pdf, other

    cs.LG math.OC stat.ML

    Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

    Authors: Qiwei Di, Tao **, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

    Abstract: Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems. While substantial efforts have been made to minimize the cumulative regret in dueling bandits, a notable gap in the current research is the absence of regret b… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 28 pages, 1 figure

  5. arXiv:2306.13420  [pdf, other

    cs.CV

    Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

    Authors: Qianji Di, Wenxi Ma, Zhongang Qi, Tianxiang Hou, Ying Shan, Hanzi Wang

    Abstract: Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images, it can significantly benefit scene understanding and other related downstream tasks. Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets. However, even if these models can fit specific datasets better, it may be hard for them to resolve… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  6. arXiv:1811.12578  [pdf, other

    physics.comp-ph cs.DC hep-ex

    Using ATLAS@Home to exploit extra CPU from busy grid sites

    Authors: Wen**g Wu, David Cameron, Qing Di

    Abstract: Grid computing typically provides most of the data processing resources for large High Energy Physics experiments. However typical grid sites are not fully utilized by regular workloads. In order to increase the CPU utilization of these grid sites, the ATLAS@Home volunteer computing framework can be used as a backfilling mechanism. Results show an extra 15% to 42% of CPU cycles can be exploited by… ▽ More

    Submitted 29 November, 2018; originally announced November 2018.

  7. arXiv:1805.11534  [pdf, other

    stat.ML cs.LG

    airpred: A Flexible R Package Implementing Methods for Predicting Air Pollution

    Authors: M. Benjamin Sabath, Qian Di, Danielle Braun, Joel Schwarz, Francesca Dominici, Christine Choirat

    Abstract: Fine particulate matter (PM$_{2.5}$) is one of the criteria air pollutants regulated by the Environmental Protection Agency in the United States. There is strong evidence that ambient exposure to (PM$_{2.5}$) increases risk of mortality and hospitalization. Large scale epidemiological studies on the health effects of PM$_{2.5}$ provide the necessary evidence base for lowering the safety standards… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.