Skip to main content

Showing 1–50 of 124 results for author: White, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16241  [pdf, other

    cs.LG stat.ME

    Position: Benchmarking is Limited in Reinforcement Learning Research

    Authors: Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas

    Abstract: Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, The Forty-first International Conference on Machine Learning (ICML 2024)

  2. arXiv:2406.12284  [pdf, other

    cs.LG cs.AI

    Demystifying the Recency Heuristic in Temporal-Difference Learning

    Authors: Brett Daley, Marlos C. Machado, Martha White

    Abstract: The recency heuristic in reinforcement learning is the assumption that stimuli that occurred closer in time to an acquired reward should be more heavily reinforced. The recency heuristic is one of the key assumptions made by TD($λ$), which reinforces recent experiences according to an exponentially decaying weighting. In fact, all other widely used return estimators for TD learning, such as $n$-st… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: RLC 2024. 18 pages, 8 figures, 1 table

  3. arXiv:2406.01562  [pdf, other

    cs.LG cs.AI

    A New View on Planning in Online Reinforcement Learning

    Authors: Kevin Roice, Parham Mohammad Panahi, Scott M. Jordan, Adam White, Martha White

    Abstract: This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundament… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Published in the Planning and Reinforcement Learning Workshop at ICAPS 2024. arXiv admin note: text overlap with arXiv:2206.02902

  4. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  5. arXiv:2404.02113  [pdf, other

    cs.LG

    K-percent Evaluation for Lifelong RL

    Authors: Golnaz Mesbahi, Parham Mohammad Panahi, Olya Mastikhina, Martha White, Adam White

    Abstract: In continual or lifelong reinforcement learning, access to the environment should be limited. If we aspire to design algorithms that can run for long periods, continually adapting to new, unexpected situations, then we must be willing to deploy our agents without tuning their hyperparameters over the agent's entire lifetime. The standard practice in deep RL, and even continual RL, is to assume unf… ▽ More

    Submitted 25 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  6. arXiv:2403.13784  [pdf, ps, other

    cs.LG cs.AI cs.CY cs.SE

    The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence

    Authors: Matt White, Ibrahim Haddad, Cailean Osborne, Xiao-Yang Liu Yanglet, Ahmed Abdelmonsef, Sachin Varghese

    Abstract: Generative AI (GAI) offers unprecedented opportunities for research and innovation, but its commercialization has raised concerns about transparency, reproducibility, and safety. Many open GAI models lack the necessary components for full understanding and reproducibility, and some use restrictive licenses whilst claiming to be ``open-source''. To address these concerns, we propose the Model Openn… ▽ More

    Submitted 3 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 22 pages

  7. arXiv:2402.13425  [pdf, other

    cs.LG cs.AI stat.ML

    Investigating the Histogram Loss in Regression

    Authors: Ehsan Imani, Kai Luedemann, Sam Scholnick-Hughes, Esraa Elelimy, Martha White

    Abstract: It is becoming increasingly common in regression to train neural networks that model the entire distribution even if only the mean is required for prediction. This additional modeling often comes with performance gain and the reasons behind the improvement are not fully known. This paper investigates a recent approach to regression, the Histogram Loss, which involves learning the conditional distr… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 50 pages

  8. arXiv:2402.10890  [pdf, other

    cs.CL cs.AI cs.LG

    When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

    Authors: Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun

    Abstract: In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method. We investigate the practical utility of two advanced planning methods, iterative correction and tree search. We present a comprehensive analysis of how discrimination accuracy affects the overall performanc… ▽ More

    Submitted 6 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL 2024 main

  9. arXiv:2402.10339  [pdf, other

    cs.LG

    What to Do When Your Discrete Optimization Is the Size of a Neural Network?

    Authors: Hugo Silva, Martha White

    Abstract: Oftentimes, machine learning applications using neural networks involve solving discrete optimization problems, such as in pruning, parameter-isolation-based continual learning and training of binary networks. Still, these discrete problems are combinatorial in nature and are also not amenable to gradient-based optimization. Additionally, classical approaches used in discrete settings do not scale… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Submitted to JMLR

  10. arXiv:2402.03903  [pdf, other

    cs.LG

    Averaging $n$-step Returns Reduces Variance in Reinforcement Learning

    Authors: Brett Daley, Martha White, Marlos C. Machado

    Abstract: Multistep returns, such as $n$-step returns and $λ$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- we… ▽ More

    Submitted 5 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: ICML 2024. 27 pages, 7 figures, 3 tables

  11. arXiv:2312.17493  [pdf, other

    cs.LG cs.CR

    Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning

    Authors: Xiao-Yang Liu, Rongyi Zhu, Daochen Zha, Jiechao Gao, Shan Zhong, Matt White, Meikang Qiu

    Abstract: The surge in interest and application of large language models (LLMs) has sparked a drive to fine-tune these models to suit specific applications, such as finance and medical science. However, concerns regarding data privacy have emerged, especially when multiple stakeholders aim to collaboratively enhance LLMs using sensitive data. In this scenario, federated learning becomes a natural choice, al… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: 21 pages, 1 figure, 19 tables

  12. arXiv:2312.02355  [pdf, other

    cs.LG cs.AI

    When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

    Authors: Vincent Liu, Prabhat Nagarajan, Andrew Patterson, Martha White

    Abstract: Offline reinforcement learning algorithms often require careful hyperparameter tuning. Consequently, before deployment, we need to select amongst a set of candidate policies. As yet, however, there is little understanding about the fundamental limits of this offline policy selection (OPS) problem. In this work we aim to provide clarity on when sample efficient OPS is possible, primarily by connect… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  13. arXiv:2312.01624  [pdf, other

    cs.LG cs.AI

    GVFs in the Real World: Making Predictions Online for Water Treatment

    Authors: Muhammad Kamran Janjua, Haseeb Shah, Martha White, Erfan Miahi, Marlos C. Machado, Adam White

    Abstract: In this paper we investigate the use of reinforcement-learning based prediction approaches for a real drinking-water treatment plant. Develo** such a prediction system is a critical step on the path to optimizing and automating water treatment. Before that, there are many questions to answer about the predictability of the data, suitable neural network architectures, how to overcome partial obse… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Published in Machine Learning (2023)

    Journal ref: Machine Learning (2023): 1-31

  14. arXiv:2311.16162  [pdf, other

    cs.DL cs.AI cs.CL

    Leveraging Artificial Intelligence Technology for Map** Research to Sustainable Development Goals: A Case Study

    Authors: Hui Yin, Amir Aryani, Gavin Lambert, Marcus White, Luis Salvador-Carulla, Shazia Sadiq, Elvira Sojli, Jennifer Boddy, Greg Murray, Wing Wah Tham

    Abstract: The number of publications related to the Sustainable Development Goals (SDGs) continues to grow. These publications cover a diverse spectrum of research, from humanities and social sciences to engineering and health. Given the imperative of funding bodies to monitor outcomes and impacts, linking publications to relevant SDGs is critical but remains time-consuming and difficult given the breadth a… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    ACM Class: I.2.7

  15. arXiv:2311.05076  [pdf, other

    cs.CE

    Evaluating diversion and treatment policies for opioid use disorder

    Authors: Veronica M. White, Laura A. Albert

    Abstract: The United States opioid crisis contributed to 80,411 fatalities in 2021. It has strained hospitals, treatment facilities, and law enforcement agencies due to the enormous resources and procedures needed to respond to the crisis. As a result, many individuals who use opioids never receive or finish the treatment they need and instead have many interactions with hospitals or the criminal justice sy… ▽ More

    Submitted 1 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  16. arXiv:2309.09446  [pdf, other

    cs.CV

    Scalable Label-efficient Footpath Network Generation Using Remote Sensing Data and Self-supervised Learning

    Authors: Xinye Wanyan, Sachith Seneviratne, Kerry Nice, Jason Thompson, Marcus White, Nano Langenheim, Mark Stevenson

    Abstract: Footpath map**, modeling, and analysis can provide important geospatial insights to many fields of study, including transport, health, environment and urban planning. The availability of robust Geographic Information System (GIS) layers can benefit the management of infrastructure inventories, especially at local government level with urban planners responsible for the deployment and maintenance… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  17. arXiv:2308.11773  [pdf

    cs.CL cs.CY cs.SD eess.AS q-bio.QM

    Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

    Authors: Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf , et al. (3 additional authors not shown)

    Abstract: Language use has been shown to correlate with depression, but large-scale validation is needed. Traditional methods like clinic studies are expensive. So, natural language processing has been employed on social media to predict depression, but limitations remain-lack of validated labels, biased user samples, and no context. Our study identified 29 topics in 3919 smartphone-collected speech recordi… ▽ More

    Submitted 5 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

  18. arXiv:2307.04887  [pdf, other

    cs.LG cs.AI

    Measuring and Mitigating Interference in Reinforcement Learning

    Authors: Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

    Abstract: Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Published at Conference on Lifelong Learning Agents (CoLLAs) 2023

  19. arXiv:2305.13073  [pdf, other

    cs.CL cs.AI cs.DB cs.LG

    Text-to-SQL Error Correction with Language Models of Code

    Authors: Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun

    Abstract: Despite recent progress in text-to-SQL parsing, current semantic parsers are still not accurate enough for practical use. In this paper, we investigate how to build automatic text-to-SQL error correction models. Noticing that token-level edits are out of context and sometimes ambiguous, we propose building clause-level edit models instead. Besides, while most language models of code are not specif… ▽ More

    Submitted 28 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Short Paper

  20. arXiv:2305.09838  [pdf, other

    cs.LG cs.AI

    Coagent Networks: Generalized and Scaled

    Authors: James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

    Abstract: Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different par… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  21. arXiv:2304.01315  [pdf, other

    cs.LG cs.AI

    Empirical Design in Reinforcement Learning

    Authors: Andrew Patterson, Samuel Neumann, Martha White, Adam White

    Abstract: Empirical design in reinforcement learning is no small task. Running good experiments requires attention to detail and at times significant computational resources. While compute resources available per dollar have continued to grow rapidly, so have the scale of typical experiments in reinforcement learning. It is now common to benchmark agents with millions of parameters against dozens of tasks,… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: In submission to JMLR

  22. arXiv:2303.16759  [pdf

    cs.CL cs.IR cs.LG cs.SI

    Exploring celebrity influence on public attitude towards the COVID-19 pandemic: social media shared sentiment analysis

    Authors: Brianna M White, Chad A Melton, Parya Zareie, Robert L Davis, Robert A Bednarczyk, Arash Shaban-Nejad

    Abstract: The COVID-19 pandemic has introduced new opportunities for health communication, including an increase in the public use of online outlets for health-related emotions. People have turned to social media networks to share sentiments related to the impacts of the COVID-19 pandemic. In this paper we examine the role of social messaging shared by Persons in the Public Eye (i.e. athletes, politicians,… ▽ More

    Submitted 23 February, 2023; originally announced March 2023.

    Comments: 7 Pages, 4 Figures

    ACM Class: I.2.7

    Journal ref: BMJ Health & Care Informatics 2023;30:e100665

  23. arXiv:2302.14372  [pdf, other

    cs.LG cs.AI

    The In-Sample Softmax for Offline Reinforcement Learning

    Authors: Chenjun Xiao, Han Wang, Yangchen Pan, Adam White, Martha White

    Abstract: Reinforcement learning (RL) agents can leverage batches of previously collected data to extract a reasonable control policy. An emerging issue in this offline RL setting, however, is that the bootstrap** update underlying many of our methods suffers from insufficient action-coverage: standard max operator may select a maximal action that has not been seen in the dataset. Bootstrap** from these… ▽ More

    Submitted 19 April, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

  24. arXiv:2302.11725  [pdf, other

    cs.LG

    Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

    Authors: Vincent Liu, Yash Chandak, Philip Thomas, Martha White

    Abstract: In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old data introduce large bias such that we can not obtain a valid confidence interval. Inspired from a related field called survey sampling, we introdu… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: AISTATS 2023

  25. arXiv:2302.05326  [pdf, other

    cs.LG cs.AI

    Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

    Authors: Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

    Abstract: Constructing states from sequences of observations is an important component of reinforcement learning agents. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. BPTT requires complete trajectories of observations before it can compute t… ▽ More

    Submitted 21 November, 2023; v1 submitted 20 January, 2023; originally announced February 2023.

    Comments: Scalable recurrent learning, online learning, real-time recurrent learning, cascade correlation networks, agent-state construction, columnar networks, constructive networks

  26. arXiv:2301.11476  [pdf, other

    cs.LG cs.AI

    Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence

    Authors: Lingwei Zhu, Zheng Chen, Matthew Schlegel, Martha White

    Abstract: Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigat… ▽ More

    Submitted 18 March, 2024; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by NeurIPS 2023

  27. arXiv:2301.11321  [pdf, other

    cs.LG

    Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning

    Authors: Brett Daley, Martha White, Christopher Amato, Marlos C. Machado

    Abstract: Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-po… ▽ More

    Submitted 31 May, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: ICML 2023. 8 pages, 2 figures. arXiv admin note: text overlap with arXiv:2112.12281

  28. arXiv:2211.15407  [pdf

    cs.CL cs.SI

    Fine-tuned Sentiment Analysis of COVID-19 Vaccine-Related Social Media Data: Comparative Study

    Authors: Chad A Melton, Brianna M White, Robert L Davis, Robert A Bednarczyk, Arash Shaban-Nejad

    Abstract: This study investigated and compared public sentiment related to COVID-19 vaccines expressed on two popular social media platforms, Reddit and Twitter, harvested from January 1, 2020, to March 1, 2022. To accomplish this task, we created a fine-tuned DistilRoBERTa model to predict sentiments of approximately 9.5 million Tweets and 70 thousand Reddit comments. To fine-tune our model, our team manua… ▽ More

    Submitted 17 October, 2022; originally announced November 2022.

    Comments: 11 Pages, 5 Figures, and 1 Table

    MSC Class: 92-11 ACM Class: I.2.7

    Journal ref: Journal of Medical Internet Research (JMIR) 2022;24(10):e40408

  29. arXiv:2210.00667  [pdf

    cs.CL

    Probing of Quantitative Values in Abstractive Summarization Models

    Authors: Nathan M. White

    Abstract: Abstractive text summarization has recently become a popular approach, but data hallucination remains a serious problem, including with quantitative data. We propose a set of probing tests to evaluate the efficacy of abstract summarization models' modeling of quantitative values found in the input text. Our results show that in most cases, the encoders of recent SOTA-performing models struggle to… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    Comments: 9 pages

  30. arXiv:2206.11249  [pdf, other

    cs.CL cs.AI cs.LG

    GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

    Authors: Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di **, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter , et al. (52 additional authors not shown)

    Abstract: Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, an… ▽ More

    Submitted 24 June, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

  31. arXiv:2206.02902  [pdf, other

    cs.LG cs.AI

    Goal-Space Planning with Subgoal Models

    Authors: Chunlok Lo, Kevin Roice, Parham Mohammad Panahi, Scott Jordan, Adam White, Gabor Mihucz, Farzane Aminmansour, Martha White

    Abstract: This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundament… ▽ More

    Submitted 27 February, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

  32. arXiv:2205.08716  [pdf, other

    cs.LG

    No More Pesky Hyperparameters: Offline Hyperparameter Tuning for RL

    Authors: Han Wang, Archit Sakhadeo, Adam White, James Bell, Vincent Liu, Xutong Zhao, Puer Liu, Tadashi Kozuno, Alona Fyshe, Martha White

    Abstract: The performance of reinforcement learning (RL) agents is sensitive to the choice of hyperparameters. In real-world settings like robotics or industrial control systems, however, testing different hyperparameter configurations directly on the environment can be financially prohibitive, dangerous, or time consuming. We propose a new approach to tune hyperparameters from offline logs of data, to full… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

  33. arXiv:2205.08464  [pdf, other

    cs.LG

    Robust Losses for Learning Value Functions

    Authors: Andrew Patterson, Victor Liao, Martha White

    Abstract: Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clip** gradients, clip** rewards… ▽ More

    Submitted 17 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence (2022)

  34. arXiv:2204.14007  [pdf, other

    cs.DC cs.CV cs.LG

    Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

    Authors: Berkin Akin, Suyog Gupta, Yun Long, Anton Spiridonov, Zhuo Wang, Marie White, Hao Xu, ** Zhou, Yanqi Zhou

    Abstract: On-device ML accelerators are becoming a standard in modern mobile system-on-chips (SoC). Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by these accelerators. However, existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms. In this work, we provide a two-pronged appr… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  35. arXiv:2204.05112  [pdf, other

    cs.CV cs.LG physics.geo-ph

    FastMapSVM: Classifying Complex Objects Using the FastMap Algorithm and Support-Vector Machines

    Authors: Malcolm C. A. White, Kushal Sharma, Ang Li, T. K. Satish Kumar, Nori Nakata

    Abstract: Neural Networks and related Deep Learning methods are currently at the leading edge of technologies used for classifying objects. However, they generally demand large amounts of time and data for model training; and their learned models can sometimes be difficult to interpret. In this paper, we advance FastMapSVM -- an interpretable Machine Learning framework for classifying complex objects -- as… ▽ More

    Submitted 15 June, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: 27 pages, 12 figures

  36. arXiv:2203.15955  [pdf, other

    cs.LG

    Investigating the Properties of Neural Network Representations in Reinforcement Learning

    Authors: Han Wang, Erfan Miahi, Martha White, Marlos C. Machado, Zaheer Abbas, Raksha Kumaraswamy, Vincent Liu, Adam White

    Abstract: In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the early work on representations for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designe… ▽ More

    Submitted 5 May, 2023; v1 submitted 29 March, 2022; originally announced March 2022.

  37. arXiv:2203.13718  [pdf, other

    cs.CV cond-mat.mtrl-sci physics.comp-ph

    Digital Fingerprinting of Microstructures

    Authors: Michael D. White, Alexander Tarakanov, Christopher P. Race, Philip J. Withers, Kody J. H. Law

    Abstract: Finding efficient means of fingerprinting microstructural information is a critical step towards harnessing data-centric machine learning approaches. A statistical framework is systematically developed for compressed characterisation of a population of images, which includes some classical computer vision methods as special cases. The focus is on materials microstructure. The ultimate purpose is t… ▽ More

    Submitted 22 January, 2024; v1 submitted 25 March, 2022; originally announced March 2022.

  38. arXiv:2203.11992  [pdf, other

    cs.LG stat.ML

    Resonance in Weight Space: Covariate Shift Can Drive Divergence of SGD with Momentum

    Authors: Kirby Banman, Liam Peet-Pare, Nidhi Hegde, Alona Fyshe, Martha White

    Abstract: Most convergence guarantees for stochastic gradient descent with momentum (SGDm) rely on iid sampling. Yet, SGDm is often used outside this regime, in settings with temporally correlated input samples such as continual learning and reinforcement learning. Existing work has shown that SGDm with a decaying step-size can converge under Markovian temporal correlation. In this work, we show that SGDm u… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

    Comments: In International Conference on Learning Representations. 2021

  39. arXiv:2202.11133  [pdf, other

    cs.LG

    Continual Auxiliary Task Learning

    Authors: Matthew McLeod, Chunlok Lo, Matthew Schlegel, Andrew Jacobsen, Raksha Kumaraswamy, Martha White, Adam White

    Abstract: Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather useful data for those off-policy predictions. In this work, we investigate a reinforcement learning syste… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: Neural Information Processing Systems 2021

  40. arXiv:2202.02396  [pdf, other

    cs.LG cs.AI

    A Temporal-Difference Approach to Policy Gradient Estimation

    Authors: Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood

    Abstract: The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gr… ▽ More

    Submitted 7 July, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  41. arXiv:2112.11622  [pdf, other

    cs.LG cs.AI

    An Alternate Policy Gradient Estimator for Softmax Policies

    Authors: Shivam Garg, Samuele Tosatto, Yangchen Pan, Martha White, A. Rupam Mahmood

    Abstract: Policy gradient (PG) estimators are ineffective in dealing with softmax policies that are sub-optimally saturated, which refers to the situation when the policy concentrates its probability mass on sub-optimal actions. Sub-optimal policy saturation may arise from bad policy initialization or sudden changes in the environment that occur after the policy has already converged. Current softmax PG est… ▽ More

    Submitted 24 February, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: Accepted to AISTATS 2022. 60 pages, 50 figures. This updated version has an additional experiment and minor corrections

  42. arXiv:2112.07806  [pdf, other

    cs.LG cs.AI

    Representation Alignment in Neural Networks

    Authors: Ehsan Imani, Wei Hu, Martha White

    Abstract: It is now a standard for neural network representations to be trained on large, publicly available datasets, and used for new problems. The reasons for why neural network representations have been so successful for transfer, however, are still not fully understood. In this paper we show that, after training, neural network representations align their top singular vectors to the targets. We investi… ▽ More

    Submitted 17 September, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: 26 pages, 21 figures

  43. arXiv:2111.08172  [pdf, other

    cs.LG

    Off-Policy Actor-Critic with Emphatic Weightings

    Authors: Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White

    Abstract: A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off-policy setting, however, has been less clear due to the existence of multiple objectives and the lack of an explicit off-policy policy gradient theorem. In this work, we unify these objectives into one off-policy… ▽ More

    Submitted 13 April, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 63 pages

    Journal ref: Journal of Machine Learning Research 24 (2023) 1-63

  44. arXiv:2111.08066  [pdf, other

    cs.LG

    Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

    Authors: Vincent Liu, James R. Wright, Martha White

    Abstract: Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regular… ▽ More

    Submitted 3 May, 2023; v1 submitted 15 November, 2021; originally announced November 2021.

    Journal ref: Journal of Artificial Intelligence Research, 77 (2023) 71-101

  45. arXiv:2110.08345  [pdf, other

    cs.CL cs.AI

    Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction

    Authors: Lingbo Mo, Ashley Lewis, Huan Sun, Michael White

    Abstract: Existing studies on semantic parsing focus primarily on map** a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural langua… ▽ More

    Submitted 27 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

    Comments: Accepted by Findings of ACL 2022

  46. arXiv:2108.13637  [pdf, other

    cs.LG cs.AI q-bio.NC stat.ML

    When are Deep Networks really better than Decision Forests at small sample sizes, and how?

    Authors: Haoyin Xu, Kaleab A. Kinfu, Will LeVine, Sambit Panda, Jayanta Dey, Michael Ainsworth, Yu-Chung Peng, Madi Kusmanov, Florian Engert, Christopher M. White, Joshua T. Vogelstein, Carey E. Priebe

    Abstract: Deep networks and decision forests (such as random forests and gradient boosted trees) are the leading machine learning methods for structured and tabular data, respectively. Many papers have empirically compared large numbers of classifiers on one or two different domains (e.g., on 100 different tabular data settings). However, a careful conceptual and empirical comparison of these two strategies… ▽ More

    Submitted 2 November, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

  47. Bespoke Fractal Sampling Patterns for Discrete Fourier Space via the Kaleidoscope Transform

    Authors: Jacob M. White, Stuart Crozier, Shekhar S. Chandra

    Abstract: Sampling strategies are important for sparse imaging methodologies, especially those employing the discrete Fourier transform (DFT). Chaotic sensing is one such methodology that employs deterministic, fractal sampling in conjunction with finite, iterative reconstruction schemes to form an image from limited samples. Using a sampling pattern constructed entirely from periodic lines in DFT space, ch… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: 6 pages, 7 figures

  48. arXiv:2107.08285  [pdf, other

    cs.LG

    Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

    Authors: Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A. Rupam Mahmood, Martha White

    Abstract: Approximate Policy Iteration (API) algorithms alternate between (approximate) policy evaluation and (approximate) greedification. Many different approaches have been explored for approximate policy evaluation, but less is understood about approximate greedification and what choices guarantee policy improvement. In this work, we investigate approximate greedification when reducing the KL divergence… ▽ More

    Submitted 18 April, 2022; v1 submitted 17 July, 2021; originally announced July 2021.

    Comments: Updated the paper with more theory in Section 5 and moved some experiments to the Appendix

  49. arXiv:2106.12621  [pdf, other

    cs.LG cs.IR stat.ME

    Leveraging semantically similar queries for ranking via combining representations

    Authors: Hayden S. Helm, Marah Abdin, Benjamin D. Pedigo, Shweti Mahajan, Vince Lyzinski, Youngser Park, Amitabh Basu, Piali~Choudhury, Christopher M. White, Weiwei Yang, Carey E. Priebe

    Abstract: In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of l… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  50. arXiv:2105.14214  [pdf, other

    cs.CL cs.LG

    Predictive Representation Learning for Language Modeling

    Authors: Qingfeng Lan, Luke Kumar, Martha White, Alona Fyshe

    Abstract: To effectively perform the task of next-word prediction, long short-term memory networks (LSTMs) must keep track of many types of information. Some information is directly related to the next word's identity, but some is more secondary (e.g. discourse-level features or features of downstream words). Correlates of secondary information appear in LSTM representations even though they are not part of… ▽ More

    Submitted 29 May, 2021; originally announced May 2021.