Skip to main content

Showing 1–50 of 71 results for author: Uehara, M

.
  1. arXiv:2406.12120  [pdf, other

    cs.LG cs.AI stat.ML

    Adding Conditional Control to Diffusion Models with Reinforcement Learning

    Authors: Yulai Zhao, Masatoshi Uehara, Gabriele Scalia, Tommaso Biancalani, Sergey Levine, Ehsan Hajiramezanali

    Abstract: Diffusion models are powerful generative models that allow for precise control over the characteristics of the generated samples. While these diffusion models trained on large datasets have achieved success, there is often a need to introduce additional controls in downstream fine-tuning processes, treating these powerful models as pre-trained diffusion models. This work presents a novel method ba… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  2. arXiv:2405.19673  [pdf, other

    cs.LG cs.AI stat.ML

    Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

    Authors: Masatoshi Uehara, Yulai Zhao, Ehsan Hajiramezanali, Gabriele Scalia, Gökcen Eraslan, Avantika Lal, Sergey Levine, Tommaso Biancalani

    Abstract: AI-driven design problems, such as DNA/protein sequence design, are commonly tackled from two angles: generative modeling, which efficiently captures the feasible design space (e.g., natural images or biological sequences), and model-based optimization, which utilizes reward models for extrapolation. To combine the strengths of both approaches, we adopt a hybrid method that fine-tunes cutting-edge… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Under review

  3. arXiv:2403.04236  [pdf, ps, other

    cs.LG econ.EM math.ST stat.ML

    Regularized DeepIV with Model Selection

    Authors: Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. While recent advancements in machine learning have introduced flexible methods for IV estimation, they often encounter one or more of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) requiring minimax computation oracle, which is highly unstable in practice; (3) ab… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  4. arXiv:2402.16359  [pdf, other

    cs.LG cs.AI q-bio.QM stat.ML

    Feedback Efficient Online Fine-Tuning of Diffusion Models

    Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Sergey Levine, Tommaso Biancalani

    Abstract: Diffusion models excel at modeling complex data distributions, including those of images, proteins, and small molecules. However, in many cases, our goal is to model parts of the distribution that maximize certain properties: for example, we may want to generate images with high aesthetic quality, or molecules with high bioactivity. It is natural to frame this as a reinforcement learning (RL) prob… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Under review (codes will be released soon)

  5. arXiv:2402.15194  [pdf, other

    cs.LG cs.AI stat.ML

    Fine-Tuning of Continuous-Time Diffusion Models as Entropy-Regularized Control

    Authors: Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali, Gabriele Scalia, Nathaniel Lee Diamant, Alex M Tseng, Tommaso Biancalani, Sergey Levine

    Abstract: Diffusion models excel at capturing complex data distributions, such as those of natural images and proteins. While diffusion models are trained to represent the distribution in the training dataset, we often are more concerned with other properties, such as the aesthetic quality of the generated images or the functional properties of generated proteins. Diffusion models can be finetuned in a goal… ▽ More

    Submitted 28 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Under review (codes will be released soon)

  6. arXiv:2401.05442  [pdf, other

    cs.LG cs.AI

    Functional Graphical Models: Structure Enables Offline Data-Driven Optimization

    Authors: Jakub Grudzien Kuba, Masatoshi Uehara, Pieter Abbeel, Sergey Levine

    Abstract: While machine learning models are typically trained to solve prediction problems, we might often want to use them for optimization problems. For example, given a dataset of proteins and their corresponding fluorescence levels, we might want to optimize for a new protein with the highest possible fluorescence. This kind of data-driven optimization (DDO) presents a range of challenges beyond those i… ▽ More

    Submitted 11 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  7. arXiv:2307.13793  [pdf, ps, other

    stat.ME cs.LG econ.EM math.ST stat.ML

    Source Condition Double Robust Inference on Functionals of Inverse Problems

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ens… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  8. arXiv:2306.15098  [pdf, other

    stat.ML cs.IR cs.LG

    Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

    Authors: Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

    Abstract: Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: KDD2023 Research track

  9. arXiv:2305.18505  [pdf, ps, other

    cs.LG cs.AI math.ST stat.ML

    Provable Reward-Agnostic Preference-Based Reinforcement Learning

    Authors: Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

    Abstract: Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories, rather than explicit reward signals. While PbRL has demonstrated practical success in fine-tuning language models, existing theoretical work focuses on regret minimization and fails to capture most of the practical frameworks. In t… ▽ More

    Submitted 17 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: ICLR 2024 Spotlight

  10. arXiv:2305.14816  [pdf, ps, other

    cs.LG math.ST stat.ML

    Provable Offline Preference-Based Reinforcement Learning

    Authors: Wenhao Zhan, Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In this paper, we investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback where feedback is available in the form of preference between trajectory pairs rather than explicit rewards. Our proposed algorithm consists of two main steps: (1) estimate the implicit reward using Maximum Likelihood Estimation (MLE) with general function approximation from offl… ▽ More

    Submitted 29 September, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: The first two authors contribute equally

  11. arXiv:2302.09456  [pdf, other

    cs.LG

    Distributional Offline Policy Evaluation with Predictive Error Guarantees

    Authors: Runzhe Wu, Masatoshi Uehara, Wen Sun

    Abstract: We study the problem of estimating the distribution of the return of a policy using an offline dataset that is not generated from the policy, i.e., distributional offline policy evaluation (OPE). We propose an algorithm called Fitted Likelihood Estimation (FLE), which conducts a sequence of Maximum Likelihood Estimation (MLE) and has the flexibility of integrating any state-of-the-art probabilisti… ▽ More

    Submitted 29 December, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted at ICML 2023

  12. arXiv:2302.05404  [pdf, ps, other

    stat.ML cs.LG econ.EM math.ST stat.ME

    Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. Recently, many flexible machine learning methods have been developed for instrumental variable estimation. However, these methods have at least one of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) only obtaining estimation error rates in terms of pseudometrics (… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Under review

  13. arXiv:2302.02392  [pdf, ps, other

    cs.LG stat.ML

    Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

    Authors: Masatoshi Uehara, Nathan Kallus, Jason D. Lee, Wen Sun

    Abstract: In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage… ▽ More

    Submitted 13 November, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: The original title of this paper was "Refined Value-Based Offline RL under Realizability and Partial Coverage," but it was later changed. This paper has been accepted for NeurIPS 2023

  14. arXiv:2212.06355  [pdf, ps, other

    stat.ML cs.LG math.ST stat.ME

    A Review of Off-Policy Evaluation in Reinforcement Learning

    Authors: Masatoshi Uehara, Chengchun Shi, Nathan Kallus

    Abstract: Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: Still under revision

  15. arXiv:2208.08291  [pdf, ps, other

    stat.ME econ.EM math.ST stat.ML

    Inference on Strongly Identified Functionals of Weakly Identified Functions

    Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

    Abstract: In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e.g., average causal effects) of nuisance function (e.g., NPIV regression) defined by conditional moment restrictions. These nuisan… ▽ More

    Submitted 30 June, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: This supersedes the previous version titled "Debiased Inference on Identified Linear Functionals of Underidentified Nuisances via Penalized Minimax Estimation"

  16. arXiv:2207.13081  [pdf, other

    cs.LG stat.ML

    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

    Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

    Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More

    Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: This paper was accepted in NeurIPS 2023

  17. arXiv:2207.05738  [pdf, other

    cs.LG

    PAC Reinforcement Learning for Predictive State Representations

    Authors: Wenhao Zhan, Masatoshi Uehara, Wen Sun, Jason D. Lee

    Abstract: In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as Partially Observable Markov Decision Processes (POMDP). PSR represents the states using a set of predictions of future observations and is defined entirely using… ▽ More

    Submitted 13 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

  18. arXiv:2206.12081  [pdf, other

    cs.LG stat.ME stat.ML

    Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study reinforcement learning with function approximation for large-scale Partially Observable Markov Decision Processes (POMDPs) where the state space and observation space are large or even continuous. Particularly, we consider Hilbert space embeddings of POMDP where the feature of latent states and the feature of observations admit a conditional Hilbert space embedding of the observation emis… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  19. arXiv:2206.12020  [pdf, ps, other

    cs.LG math.ST stat.ME stat.ML

    Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

    Authors: Masatoshi Uehara, Ayush Sekhari, Jason D. Lee, Nathan Kallus, Wen Sun

    Abstract: We study Reinforcement Learning for partially observable dynamical systems using function approximation. We propose a new \textit{Partially Observable Bilinear Actor-Critic framework}, that is general enough to include models such as observable tabular Partially Observable Markov Decision Processes (POMDPs), observable Linear-Quadratic-Gaussian (LQG), Predictive State Representations (PSRs), as we… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

  20. arXiv:2204.02718  [pdf, other

    cs.CL cs.CY

    Annotation-Scheme Reconstruction for "Fake News" and Japanese Fake News Dataset

    Authors: Taichi Murayama, Shohei Hisada, Makoto Uehara, Shoko Wakamiya, Eiji Aramaki

    Abstract: Fake news provokes many societal problems; therefore, there has been extensive research on fake news detection tasks to counter it. Many fake news datasets were constructed as resources to facilitate this task. Contemporary research focuses almost exclusively on the factuality aspect of the news. However, this aspect alone is insufficient to explain "fake news," which is a complex phenomenon that… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 13th International Conference on Language Resources and Evaluation (LREC), 2022

  21. arXiv:2202.00063  [pdf, other

    cs.LG cs.AI

    Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

    Authors: Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

    Abstract: We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i.e., Block MDPs), where rich observations are generated from a set of unknown latent states. BRIEE interleaves latent states discovery, exploration, and exploitation together, and can provably l… ▽ More

    Submitted 11 October, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

  22. arXiv:2111.06784  [pdf, other

    cs.LG stat.ML

    A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

    Authors: Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

    Abstract: We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. In this work, we first propose… ▽ More

    Submitted 15 June, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

  23. arXiv:2110.04652  [pdf, other

    cs.LG cs.AI stat.ML

    Representation Learning for Online and Offline RL in Low-rank MDPs

    Authors: Masatoshi Uehara, Xuezhou Zhang, Wen Sun

    Abstract: This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike pr… ▽ More

    Submitted 5 January, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  24. arXiv:2107.06226  [pdf, other

    cs.LG cs.AI stat.ML

    Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage

    Authors: Masatoshi Uehara, Wen Sun

    Abstract: We study model-based offline Reinforcement Learning with general function approximation without a full coverage assumption on the offline data distribution. We present an algorithm named Constrained Pessimistic Policy Optimization (CPPO)which leverages a general function class and uses a constraint over the model class to encode pessimism. Under the assumption that the ground truth model belongs t… ▽ More

    Submitted 9 January, 2023; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: We changed the title from the first version. This is a longer version of the article accepted in ICLR 2022. The following things are added (1) a new algorithm CPPO-LR where the constraint is given in a log-likelihood form, (2) how to instantiate CPPO on (nonparametric) linear MDPs, (3) posterior sampling in a model-free way

  25. arXiv:2106.03207  [pdf, other

    cs.LG stat.ML

    Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

    Authors: Jonathan D. Chang, Masatoshi Uehara, Dhruv Sreenivas, Rahul Kidambi, Wen Sun

    Abstract: This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework… ▽ More

    Submitted 31 January, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: 42 pages, 5 figures, 7 tables

  26. arXiv:2103.14029  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

    Authors: Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

    Abstract: We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions.… ▽ More

    Submitted 9 October, 2022; v1 submitted 25 March, 2021; originally announced March 2021.

  27. arXiv:2102.02981  [pdf, ps, other

    cs.LG math.ST stat.ML

    Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency

    Authors: Masatoshi Uehara, Masaaki Imaizumi, Nan Jiang, Nathan Kallus, Wen Sun, Tengyang Xie

    Abstract: We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning using function approximation for marginal importance weights and $q$-functions when these are estimated using recent minimax methods. Under various combinations of realizability and completeness assumptions, we show that the minimax approach enables us to achieve a fast rate of convergence for weights… ▽ More

    Submitted 24 July, 2022; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: Under Review

  28. arXiv:2102.00479  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Fast Rates for the Regret of Offline Reinforcement Learning

    Authors: Yichun Hu, Nathan Kallus, Masatoshi Uehara

    Abstract: We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted $Q$-iteration (FQI), suggest a $O(1/\sqrt{n})$ convergence for regret, empirical behavior exhibits \emph{much} faster convergence. In this paper, we present a finer regret a… ▽ More

    Submitted 12 July, 2023; v1 submitted 31 January, 2021; originally announced February 2021.

  29. arXiv:2010.11002  [pdf, other

    cs.LG stat.ME stat.ML

    Optimal Off-Policy Evaluation from Multiple Logging Policies

    Authors: Nathan Kallus, Yuta Saito, Masatoshi Uehara

    Abstract: We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. Previous work noted that in this setting the ordering of the variances of different importance sampling estimators is instance-dependent, which brings up a dilemma as to which importance sampling weights to use. In this paper, we resolve this dilemma by finding t… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Under Review

  30. arXiv:2006.03900  [pdf, other

    cs.LG math.OC stat.ML

    Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: Offline reinforcement learning, wherein one uses off-policy data logged by a fixed behavior policy to evaluate and learn new policies, is crucial in applications where experimentation is limited such as medicine. We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous. Targeting deterministic policies, for which action is a de… ▽ More

    Submitted 6 June, 2020; originally announced June 2020.

  31. arXiv:2006.03886  [pdf, other

    cs.LG math.OC stat.ML

    Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap,… ▽ More

    Submitted 3 November, 2020; v1 submitted 6 June, 2020; originally announced June 2020.

    Comments: Under review

  32. arXiv:2002.11642  [pdf, ps, other

    stat.ML cs.LG econ.EM

    Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

    Authors: Masahiro Kato, Masatoshi Uehara, Shota Yasui

    Abstract: We consider evaluating and training a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OP… ▽ More

    Submitted 15 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

  33. arXiv:2002.04014  [pdf, other

    stat.ML cs.LG math.OC

    Statistically Efficient Off-Policy Policy Gradients

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: Policy gradient methods in reinforcement learning update policy parameters by taking steps in the direction of an estimated gradient of policy value. In this paper, we consider the statistically efficient estimation of policy gradients from off-policy data, where the estimation is particularly non-trivial. We derive the asymptotic lower bound on the feasible mean-squared error in both Markov and n… ▽ More

    Submitted 20 February, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  34. arXiv:1912.12945  [pdf, other

    stat.ML cs.LG stat.ME

    Localized Debiased Machine Learning: Efficient Inference on Quantile Treatment Effects and Beyond

    Authors: Nathan Kallus, Xiaojie Mao, Masatoshi Uehara

    Abstract: We consider estimating a low-dimensional parameter in an estimating equation involving high-dimensional nuisances that depend on the parameter. A central example is the efficient estimating equation for the (local) quantile treatment effect ((L)QTE) in causal inference, which involves as a nuisance the covariate-conditional cumulative distribution function evaluated at the quantile to be estimated… ▽ More

    Submitted 17 August, 2022; v1 submitted 30 December, 2019; originally announced December 2019.

  35. arXiv:1910.12809  [pdf, other

    cs.LG stat.ML

    Minimax Weight and Q-Function Learning for Off-Policy Evaluation

    Authors: Masatoshi Uehara, Jiawei Huang, Nan Jiang

    Abstract: We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that directly estimates importance ratios over the state-action distributions, removing the reliance on knowledge of the behavior policy as in prior work (Liu et al., 2… ▽ More

    Submitted 6 October, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

  36. arXiv:1909.05850  [pdf, other

    stat.ML cs.LG math.OC

    Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds for OPE when one assumes each of these structures. This precisely characterizes the cu… ▽ More

    Submitted 15 January, 2023; v1 submitted 12 September, 2019; originally announced September 2019.

    Comments: In V3, we significantly changed the derivation of the efficiency bound to follow standard (iid) semiparametric theory. We also derive the efficient influence function. In V4, we add an experiment in a continuous-state environment employing function approximation. In v6, we fixed several typos. Please refer to this version as the final version

  37. arXiv:1908.08526  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. We consider for the first time the semiparametric efficiency limits of OPE in Markov decision processes (MDPs), where actions, rewards, and states are memoryless. We show existing OPE estimators may fail to be ef… ▽ More

    Submitted 5 June, 2020; v1 submitted 22 August, 2019; originally announced August 2019.

  38. arXiv:1906.03735  [pdf, ps, other

    cs.LG stat.ML

    Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning

    Authors: Nathan Kallus, Masatoshi Uehara

    Abstract: Off-policy evaluation (OPE) in both contextual bandits and reinforcement learning allows one to evaluate novel decision policies without needing to conduct exploration, which is often costly or otherwise infeasible. The problem's importance has attracted many proposed solutions, including importance sampling (IS), self-normalized IS (SNIS), and doubly robust (DR) estimates. DR and its variants ens… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

  39. arXiv:1905.05976  [pdf, ps, other

    math.ST cs.LG stat.ML

    Information criteria for non-normalized models

    Authors: Takeru Matsuda, Masatoshi Uehara, Aapo Hyvarinen

    Abstract: Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which do not require explicit computation of the normalization constant, such as noise contrastive estimation (NCE) and score matching. However, model s… ▽ More

    Submitted 27 July, 2021; v1 submitted 15 May, 2019; originally announced May 2019.

    Journal ref: Journal of Machine Learning Research, 22(158):1--33, 2021

  40. arXiv:1903.03630  [pdf, ps, other

    stat.ML cs.LG stat.ME

    Imputation estimators for unnormalized models with missing data

    Authors: Masatoshi Uehara, Takeru Matsuda, Jae Kwang Kim

    Abstract: Several statistical models are given in the form of unnormalized densities, and calculation of the normalization constant is intractable. We propose estimation methods for such unnormalized models with missing data. The key concept is to combine imputation techniques with estimators for unnormalized models including noise contrastive estimation and score matching. In addition, we derive asymptotic… ▽ More

    Submitted 8 June, 2020; v1 submitted 8 March, 2019; originally announced March 2019.

    Comments: To appear (AISTATS 2020)

  41. arXiv:1901.07710  [pdf, other

    stat.ML cs.LG

    Unified estimation framework for unnormalized models with statistical efficiency

    Authors: Masatoshi Uehara, Takafumi Kanamori, Takashi Takenouchi, Takeru Matsuda

    Abstract: The parameter estimation of unnormalized models is a challenging problem. The maximum likelihood estimation (MLE) is computationally infeasible for these models since normalizing constants are not explicitly calculated. Although some consistent estimators have been proposed earlier, the problem of statistical efficiency remains. In this study, we propose a unified, statistically efficient estimati… ▽ More

    Submitted 5 June, 2020; v1 submitted 22 January, 2019; originally announced January 2019.

    Comments: To appear at AISTATS 2020

  42. arXiv:1810.12519  [pdf, ps, other

    stat.ME

    Semiparametric response model with nonignorable nonresponse

    Authors: Masatoshi Uehara, Jae Kwang Kim

    Abstract: How to deal with nonignorable response is often a challenging problem encountered in statistical analysis with missing data. Parametric model assumption for the response mechanism is often made and there is no way to validate the model assumption with missing data. We consider a semiparametric response model that relaxes the parametric model assumption in the response mechanism. Two types of effic… ▽ More

    Submitted 30 October, 2018; originally announced October 2018.

  43. arXiv:1808.07983  [pdf, other

    stat.ML cs.LG

    Analysis of Noise Contrastive Estimation from the Perspective of Asymptotic Variance

    Authors: Masatoshi Uehara, Takeru Matsuda, Fumiyasu Komaki

    Abstract: There are many models, often called unnormalized models, whose normalizing constants are not calculated in closed form. Maximum likelihood estimation is not directly applicable to unnormalized models. Score matching, contrastive divergence method, pseudo-likelihood, Monte Carlo maximum likelihood, and noise contrastive estimation (NCE) are popular methods for estimating parameters of such models.… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

  44. arXiv:1610.02920  [pdf, other

    stat.ML

    Generative Adversarial Nets from a Density Ratio Estimation Perspective

    Authors: Masatoshi Uehara, Issei Sato, Masahiro Suzuki, Kotaro Nakayama, Yutaka Matsuo

    Abstract: Generative adversarial networks (GANs) are successful deep generative models. GANs are based on a two-player minimax game. However, the objective function derived in the original motivation is changed to obtain stronger gradients when learning the generator. We propose a novel algorithm that repeats the density ratio estimation and f-divergence minimization. Our algorithm offers a new perspective… ▽ More

    Submitted 9 November, 2016; v1 submitted 10 October, 2016; originally announced October 2016.

    Comments: Add contents especially theoretical things for ICLR 2017

  45. The Atacama Cosmology Telescope: The polarization-sensitive ACTPol instrument

    Authors: R. J. Thornton, P. A. R. Ade, S. Aiola, F. E. Angile, M. Amiri, J. A. Beall, D. T. Becker, H-M. Cho, S. K. Choi, P. Corlies, K. P. Coughlin, R. Datta, M. J. Devlin, S. R. Dicker, R. Dunner, J. W. Fowler, A. E. Fox, P. A. Gallardo, J. Gao, E. Grace, M. Halpern, M. Hasselfield, S. W. Henderson, G. C. Hilton, A. D. Hincks , et al. (31 additional authors not shown)

    Abstract: The Atacama Cosmology Telescope (ACT) is designed to make high angular resolution measurements of anisotropies in the Cosmic Microwave Background (CMB) at millimeter wavelengths. We describe ACTPol, an upgraded receiver for ACT, which uses feedhorn-coupled, polarization-sensitive detector arrays, a 3 degree field of view, 100 mK cryogenics with continuous cooling, and meta material anti-reflection… ▽ More

    Submitted 20 May, 2016; originally announced May 2016.

  46. The Atacama Cosmology Telescope: CMB Polarization at $200<\ell<9000$

    Authors: Sigurd Naess, Matthew Hasselfield, Jeff McMahon, Michael D. Niemack, Graeme E. Addison, Peter A. R. Ade, Rupert Allison, Mandana Amiri, Nick Battaglia, James A. Beall, Francesco de Bernardis, J Richard Bond, Joe Britton, Erminia Calabrese, Hsiao-mei Cho, Kevin Coughlin, Devin Crichton, Sudeep Das, Rahul Datta, Mark J. Devlin, Simon R. Dicker, Joanna Dunkley, Rolando Dünner, Joseph W. Fowler, Anna E. Fox , et al. (53 additional authors not shown)

    Abstract: We report on measurements of the cosmic microwave background (CMB) and celestial polarization at 146 GHz made with the Atacama Cosmology Telescope Polarimeter (ACTPol) in its first three months of observing. Four regions of sky covering a total of 270 square degrees were mapped with an angular resolution of $1.3'$. The map noise levels in the four regions are between 11 and 17 $μ$K-arcmin. We pres… ▽ More

    Submitted 21 September, 2014; v1 submitted 21 May, 2014; originally announced May 2014.

    Comments: 16 pages, 15 figures, 5 tables

  47. The Atacama Cosmology Telescope: Cosmological parameters from three seasons of data

    Authors: Jonathan L. Sievers, Renée A. Hlozek, Michael R. Nolta, Viviana Acquaviva, Graeme E. Addison, Peter A. R. Ade, Paula Aguirre, Mandana Amiri, John William Appel, L. Felipe Barrientos, Elia S. Battistelli, Nick Battaglia, J. Richard Bond, Ben Brown, Bryce Burger, Erminia Calabrese, Jay Chervenak, Devin Crichton, Sudeep Das, Mark J. Devlin, Simon R. Dicker, W. Bertrand Doriese, Joanna Dunkley, Rolando Dünner, Thomas Essinger-Hileman , et al. (68 additional authors not shown)

    Abstract: We present constraints on cosmological and astrophysical parameters from high-resolution microwave background maps at 148 GHz and 218 GHz made by the Atacama Cosmology Telescope (ACT) in three seasons of observations from 2008 to 2010. A model of primary cosmological and secondary foreground parameters is fit to the map power spectra and lensing deflection power spectrum, including contributions f… ▽ More

    Submitted 11 October, 2013; v1 submitted 4 January, 2013; originally announced January 2013.

    Comments: 26 pages, 22 figures. This paper is a companion to Das et al. (2013) and Dunkley et al. (2013). Matches published JCAP version

  48. arXiv:1012.1996  [pdf

    cond-mat.supr-con

    Intrinsic pinning property of FeSe0.5Te0.5

    Authors: M. Migita, Y. Takikawa, M. Takeda, M. Uehara, T. Kuramoto, Y. Takano, Y. Mizuguchi, Y. Kimishima

    Abstract: The intrinsic pinning properties of FeSe0.5Te0.5, which is the superconductor with Tc of about 14 K, were studied by the analysis of magnetization curves by the extended critical state model. In the magnetization measurements by SQUID magnetometer, the external magnetic fields were applied parallel and perpendicular to c-axis of the sample. The critical current density Jc's under the perpendicular… ▽ More

    Submitted 9 December, 2010; originally announced December 2010.

    Comments: 19 pages, 7 figures

  49. arXiv:0811.3483  [pdf

    cond-mat.supr-con

    New anti-perovskite-type Superconductor ZnNyNi3

    Authors: Masatomo Uehara, Akira Uehara, Katsuya Kozawa, Yoshihide Kimishima

    Abstract: We have synthesized a new superconductor ZnNyNi3 with Tc ~3 K. The crystal structure has the same anti-perovskite-type such as MgCNi3 and CdCNi3. As far as we know, this is the third superconducting material in Ni-based anti-perovskite series. For this material, superconducting parameters, lower-critical field Hc1(0), upper-critical field Hc2(0), coherence length x(0), penetration depth l(0), an… ▽ More

    Submitted 21 November, 2008; originally announced November 2008.

    Comments: 13 pages, 3 figures, 1 table

  50. arXiv:0810.0350  [pdf

    cond-mat.mtrl-sci cond-mat.str-el

    Carrier do** to pseudo-low-dimensional compound La2RuO5

    Authors: Masatomo Uehara, Kenich Ashikawa, Yoshimasa Aka, Yoshihide Kimishima

    Abstract: Hole carrier do** has been tried to pseudo-low-dimensional material La2RuO5 by substituting La3+ with Cd2+. Single phased samples of La2-xCdxRuO5 with x up to 0.5 have been successfully obtained and also high pressure O2 annealing has been performed to the x=0.5 sample. Although the formal ionic state of Ru is expected to increase from 4+ (at x=0) to 4.5+ (at x=0.5), the magnetic and electrica… ▽ More

    Submitted 2 October, 2008; originally announced October 2008.

    Comments: 10 pages, 7 figures