Skip to main content

Showing 1–39 of 39 results for author: Honda, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.16168  [pdf, other

    cs.LG stat.ML

    Multi-Player Approaches for Dueling Bandits

    Authors: Or Raveh, Junya Honda, Masashi Sugiyama

    Abstract: Various approaches have emerged for multi-armed bandits in distributed systems. The multiplayer dueling bandit problem, common in scenarios with only preference-based information like human feedback, introduces challenges related to controlling collaborative exploration of non-informative arm pairs, but has received little attention. To fill this gap, we demonstrate that the direct use of a Follow… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  2. arXiv:2405.04910  [pdf, other

    cs.LG stat.ML

    Learning with Posterior Sampling for Revenue Management under Time-varying Demand

    Authors: Kazuma Shimizu, Junya Honda, Shinji Ito, Shinji Nakadai

    Abstract: This paper discusses the revenue management (RM) problem to maximize revenue by pricing items or services. One challenge in this problem is that the demand distribution is unknown and varies over time in real applications such as airline and retail industries. In particular, the time-varying demand has not been well studied under scenarios of unknown demand due to the difficulty of jointly managin… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: An extended version of the paper accepted by the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  3. arXiv:2403.05134  [pdf, other

    stat.ML cs.LG

    Follow-the-Perturbed-Leader with Fréchet-type Tail Distributions: Optimality in Adversarial Bandits and Best-of-Both-Worlds

    Authors: Jongyeong Lee, Junya Honda, Shinji Ito, Min-hwan Oh

    Abstract: This paper studies the optimality of the Follow-the-Perturbed-Leader (FTPL) policy in both adversarial and stochastic $K$-armed bandits. Despite the widespread use of the Follow-the-Regularized-Leader (FTRL) framework with various choices of regularization, the FTPL framework, which relies on random perturbations, has not received much attention, despite its inherent simplicity. In adversarial ban… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 54 pages

  4. arXiv:2403.00715  [pdf, ps, other

    cs.LG stat.ML

    Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds

    Authors: Shinji Ito, Taira Tsuchiya, Junya Honda

    Abstract: Follow-The-Regularized-Leader (FTRL) is known as an effective and versatile approach in online learning, where appropriate choice of the learning rate is crucial for smaller regret. To this end, we formulate the problem of adjusting FTRL's learning rate as a sequential decision-making problem and introduce the framework of competitive analysis. We establish a lower bound for the competitive ratio… ▽ More

    Submitted 10 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  5. arXiv:2402.08321  [pdf, ps, other

    cs.LG stat.ML

    Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

    Authors: Taira Tsuchiya, Shinji Ito, Junya Honda

    Abstract: Partial monitoring is a generic framework of online decision-making problems with limited observations. To make decisions from such limited observations, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, exploration by optimization (ExO), was proposed, which achieves the optimal bounds in adversarial environments with follow-the-re… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: 30 pages

  6. arXiv:2310.00539  [pdf, other

    stat.ML cs.LG

    Thompson Exploration with Best Challenger Rule in Best Arm Identification

    Authors: Jongyeong Lee, Junya Honda, Masashi Sugiyama

    Abstract: This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving an optimization problem at every round and/or are forced to explore an arm at least a certain number of times except those restricted to the Gaussian model. To… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: TBA ACML2023, 49pages

  7. arXiv:2305.17301  [pdf, ps, other

    cs.LG stat.ML

    Stability-penalty-adaptive follow-the-regularized-leader: Sparsity, game-dependency, and best-of-both-worlds

    Authors: Taira Tsuchiya, Shinji Ito, Junya Honda

    Abstract: Adaptivity to the difficulties of a problem is a key property in sequential decision-making problems to broaden the applicability of algorithms. Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems. Aiming to further generalize this adaptivity, we develop a generic adaptive learning rate, call… ▽ More

    Submitted 13 February, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: Published version in Advances in Neural Information Processing Systems 36 (NeurIPS 2023), 32 pages

  8. arXiv:2303.06058  [pdf, ps, other

    cs.LG stat.ML

    A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

    Authors: Dorian Baudry, Kazuya Suzuki, Junya Honda

    Abstract: In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling… ▽ More

    Submitted 21 December, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

  9. arXiv:2302.01544  [pdf, other

    cs.LG math.ST stat.ML

    Optimality of Thompson Sampling with Noninformative Priors for Pareto Bandits

    Authors: Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama

    Abstract: In the stochastic multi-armed bandit problem, a randomized probability matching policy called Thompson sampling (TS) has shown excellent performance in various reward models. In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models. However, its optimality has been mainly addressed under light-tailed or one-parameter models… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

    Comments: 49 pages, a preprint

  10. arXiv:2207.14550  [pdf, ps, other

    cs.LG stat.ML

    Best-of-Both-Worlds Algorithms for Partial Monitoring

    Authors: Taira Tsuchiya, Shinji Ito, Junya Honda

    Abstract: This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_Π T) / Δ_{\min})$ in the stochastic regime and… ▽ More

    Submitted 9 October, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: 31 pages

  11. arXiv:2206.06810  [pdf, ps, other

    cs.LG stat.ML

    Adversarially Robust Multi-Armed Bandit Algorithm with Variance-Dependent Regret Bounds

    Authors: Shinji Ito, Taira Tsuchiya, Junya Honda

    Abstract: This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BOBW) algorithm that works nearly optimally in both stochastic and adversarial settings. In stochastic settings, some existing BOBW algorithms achieve tight gap-dependent regret bounds of $O(\sum_{i: Δ_i>0} \frac{\log T}{Δ_i})$ for suboptimality gap $Δ_i$ of arm $i$ and time horizon $T$. As Audibert e… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted for presentation at the 35th Annual Conference on Learning Theory (COLT 2022). Only the extended abstract will appear in the conference proceedings

  12. arXiv:2206.04646  [pdf, other

    stat.ML cs.LG

    Minimax Optimal Algorithms for Fixed-Budget Best Arm Identification

    Authors: Junpei Komiyama, Taira Tsuchiya, Junya Honda

    Abstract: We consider the fixed-budget best arm identification problem where the goal is to find the arm of the largest mean with a fixed number of samples. It is known that the probability of misidentifying the best arm is exponentially small to the number of rounds. However, limited characterizations have been discussed on the rate (exponent) of this value. In this paper, we characterize the minimax optim… ▽ More

    Submitted 26 October, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022 version https://openreview.net/forum?id=TIQfmR7IF6H

  13. arXiv:2206.03019  [pdf, other

    cs.LG stat.ML

    The Survival Bandit Problem

    Authors: Charles Riou, Junya Honda, Masashi Sugiyama

    Abstract: We introduce and study a new variant of the multi-armed bandit problem (MAB), called the survival bandit problem (S-MAB). While in both problems, the objective is to maximize the so-called cumulative reward, in this new variant, the procedure is interrupted if the cumulative reward falls below a preset threshold. This simple yet unexplored extension of the MAB follows from many practical applicati… ▽ More

    Submitted 6 January, 2024; v1 submitted 7 June, 2022; originally announced June 2022.

  14. arXiv:2206.00873  [pdf, ps, other

    cs.LG

    Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

    Authors: Shinji Ito, Taira Tsuchiya, Junya Honda

    Abstract: This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable gra… ▽ More

    Submitted 26 December, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS 2022

  15. arXiv:2107.11419  [pdf, other

    stat.ML cs.LG

    Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

    Authors: Junpei Komiyama, Edouard Fouché, Junya Honda

    Abstract: We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams. We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques, which are of independe… ▽ More

    Submitted 25 October, 2023; v1 submitted 23 July, 2021; originally announced July 2021.

    Comments: Revision: Regret bound for ADR-Bandit + TS

  16. arXiv:2107.08135  [pdf, other

    stat.ML cs.LG

    Mediated Uncoupled Learning: Learning Functions without Direct Input-output Correspondences

    Authors: Ikko Yamane, Junya Honda, Florian Yger, Masashi Sugiyama

    Abstract: Ordinary supervised learning is useful when we have paired training data of input $X$ and output $Y$. However, such paired data can be difficult to collect in practice. In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two… ▽ More

    Submitted 17 July, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: ICML 2021 version with correction to Figure 1 and the appendices

  17. arXiv:2012.15584  [pdf, other

    cs.LG cs.DM cs.DS cs.SI stat.ML

    Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation

    Authors: Yuko Kuroki, Junya Honda, Masashi Sugiyama

    Abstract: Combinatorial optimization is one of the fundamental research fields that has been extensively studied in theoretical computer science and operations research. When develo** an algorithm for combinatorial optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs. However, this assumption may not be fulfilled since input parameters are often uncertain o… ▽ More

    Submitted 29 August, 2023; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Preprint of an Invited Review Article, In Fields Institute

  18. arXiv:2006.13642  [pdf, other

    cs.LG cs.DS cs.SI stat.ML

    Online Dense Subgraph Discovery via Blurred-Graph Feedback

    Authors: Yuko Kuroki, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama

    Abstract: Dense subgraph discovery aims to find a dense component in edge-weighted graphs. This is a fundamental graph-mining task with a variety of applications and thus has received much attention recently. Although most existing methods assume that each individual edge weight is easily obtained, such an assumption is not necessarily valid in practice. In this paper, we introduce a novel learning problem… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: ICML2020

  19. arXiv:2006.09668  [pdf, other

    stat.ML cs.LG

    Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

    Authors: Taira Tsuchiya, Junya Honda, Masashi Sugiyama

    Abstract: We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the p… ▽ More

    Submitted 10 June, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: Published version in NeurIPS 2020 (https://proceedings.neurips.cc/paper/2020/hash/649d45bf179296e31731adfd4df25588-Abstract.html), 39 pages, 4 figures

  20. arXiv:2003.04691  [pdf, other

    stat.ML cs.LG

    Time-varying Gaussian Process Bandit Optimization with Non-constant Evaluation Time

    Authors: Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama

    Abstract: The Gaussian process bandit is a problem in which we want to find a maximizer of a black-box function with the minimum number of function evaluations. If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework. However, a drawback with current methods is in the assumption that the evaluation time for every observation is constant, which can be unre… ▽ More

    Submitted 10 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

  21. arXiv:2002.05308  [pdf, ps, other

    stat.ML cs.LG econ.EM

    Efficient Adaptive Experimental Design for Average Treatment Effect Estimation

    Authors: Masahiro Kato, Takuya Ishihara, Junya Honda, Yusuke Narita

    Abstract: The goal of many scientific experiments including A/B testing is to estimate the average treatment effect (ATE), which is defined as the difference between the expected outcomes of two or more treatments. In this paper, we consider a situation where an experimenter can assign a treatment to research subjects sequentially. In adaptive experimental design, the experimenter is allowed to change the p… ▽ More

    Submitted 26 October, 2021; v1 submitted 12 February, 2020; originally announced February 2020.

  22. arXiv:1905.13659  [pdf, other

    cs.LG stat.ML

    Uncoupled Regression from Pairwise Comparison Data

    Authors: Liyuan Xu, Junya Honda, Gang Niu, Masashi Sugiyama

    Abstract: Uncoupled regression is the problem to learn a model from unlabeled data and the set of target values while the correspondence between them is unknown. Such a situation arises in predicting anonymized targets that involve sensitive information, e.g., one's annual income. Since existing methods for uncoupled regression often require strong assumptions on the true target function, and thus, their ra… ▽ More

    Submitted 3 June, 2019; v1 submitted 31 May, 2019; originally announced May 2019.

  23. arXiv:1903.07839  [pdf, ps, other

    cs.LG stat.ML

    A Note on KL-UCB+ Policy for the Stochastic Bandit

    Authors: Junya Honda

    Abstract: A classic setting of the stochastic K-armed bandit problem is considered in this note. In this problem it has been known that KL-UCB policy achieves the asymptotically optimal regret bound and KL-UCB+ policy empirically performs better than the KL-UCB policy although the regret bound for the original form of the KL-UCB+ policy has been unknown. This note demonstrates that a simple proof of the asy… ▽ More

    Submitted 20 March, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: 6 pages, corrected typos

  24. arXiv:1902.10582  [pdf, other

    cs.LG cs.DS stat.ML

    Polynomial-time Algorithms for Multiple-arm Identification with Full-bandit Feedback

    Authors: Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama

    Abstract: We study the problem of stochastic combinatorial pure exploration (CPE), where an agent sequentially pulls a set of single arms (a.k.a. a super arm) and tries to find the best super arm. Among a variety of problem settings of the CPE, we focus on the full-bandit setting, where we cannot observe the reward of each single arm, but only the sum of the rewards. Although we can regard the CPE with full… ▽ More

    Submitted 1 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: 21 pages

    Journal ref: Neural Computation 32, 1733-1773, 2020

  25. arXiv:1901.11200  [pdf, ps, other

    cs.LG stat.ML

    A Bad Arm Existence Checking Problem

    Authors: Koji Tabata, Atsuyoshi Nakamura, Junya Honda, Tamiki Komatsuzaki

    Abstract: We study a bad arm existing checking problem in which a player's task is to judge whether a positive arm exists or not among given K arms by drawing as small number of arms as possible. Here, an arm is positive if its expected loss suffered by drawing the arm is at least a given threshold. This problem is a formalization of diagnosis of disease or machine failure. An interesting structure of this… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

  26. arXiv:1901.10655  [pdf, other

    stat.ML cs.LG

    On the Calibration of Multiclass Classification with Rejection

    Authors: Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama

    Abstract: We investigate the problem of multiclass classification with rejection, where a classifier can choose not to make a prediction to avoid critical misclassification. First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case. We analyze this approach for the multiclass case and derive a general cond… ▽ More

    Submitted 29 October, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: NeurIPS2019 camera-ready, 31 pages

  27. arXiv:1809.05274  [pdf, ps, other

    stat.ML cs.LG

    Dueling Bandits with Qualitative Feedback

    Authors: Liyuan Xu, Junya Honda, Masashi Sugiyama

    Abstract: We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem… ▽ More

    Submitted 17 September, 2018; v1 submitted 14 September, 2018; originally announced September 2018.

  28. arXiv:1809.03839  [pdf, other

    cs.LG stat.ML

    Unsupervised Domain Adaptation Based on Source-guided Discrepancy

    Authors: Seiichi Kuroki, Nontawat Charoenphakdee, Han Bao, Junya Honda, Issei Sato, Masashi Sugiyama

    Abstract: Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different, and labels in the target domain are unavailable. One important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. A previously proposed discrepancy that does not use the source domain labels require… ▽ More

    Submitted 19 November, 2018; v1 submitted 11 September, 2018; originally announced September 2018.

    Comments: To appear in AAAI-19

  29. arXiv:1707.04401  [pdf, ps, other

    cs.IT

    Comprehensive Analysis on Exact Asymptotics of Random Coding Error Probability

    Authors: Junya Honda

    Abstract: This paper considers error probabilities of random codes for memoryless channels in the fixed-rate regime. Random coding is a fundamental scheme to achieve the channel capacity and many studies have been conducted for the asymptotics of the decoding error probability. Gallager derived the exact asymptotics (that is, a bound with asymptotically vanishing relative error) of the error probability bel… ▽ More

    Submitted 14 July, 2017; originally announced July 2017.

  30. arXiv:1706.06775  [pdf, ps, other

    cs.IT

    Variable-to-Fixed Length Homophonic Coding Suitable for Asymmetric Channel Coding

    Authors: Junya Honda, Hirosuke Yamamoto

    Abstract: In communication through asymmetric channels the capacity-achieving input distribution is not uniform in general. Homophonic coding is a framework to invertibly convert a (usually uniform) message into a sequence with some target distribution, and is a promising candidate to generate codewords with the nonuniform target distribution for asymmetric channels. In particular, a Variable-to-Fixed lengt… ▽ More

    Submitted 29 June, 2017; v1 submitted 21 June, 2017; originally announced June 2017.

    Comments: Full version of the paper to appear in 2017 IEEE International Symposium on Information Theory (ISIT2017)

  31. arXiv:1607.07247  [pdf, ps, other

    cs.IT

    Worst-case Redundancy of Optimal Binary AIFV Codes and their Extended Codes

    Authors: Weihua Hu, Hirosuke Yamamoto, Junya Honda

    Abstract: Binary AIFV codes are lossless codes that generalize the class of instantaneous FV codes. The code uses two code trees and assigns source symbols to incomplete internal nodes as well as to leaves. AIFV codes are empirically shown to attain better compression ratio than Huffman codes. Nevertheless, an upper bound on the redundancy of optimal binary AIFV codes is only known to be 1, which is the sam… ▽ More

    Submitted 1 August, 2017; v1 submitted 25 July, 2016; originally announced July 2016.

    Comments: IEEE Transactions on Information Theory, vol.63, no.8, pp.5074-5086, Aug. 2017

  32. arXiv:1607.06914  [pdf, ps, other

    cs.IT

    Variable-to-Fixed Length Homophonic Coding with a Modified Shannon-Fano-Elias Code

    Authors: Junya Honda, Hirosuke Yamamoto

    Abstract: Homophonic coding is a framework to reversibly convert a message into a sequence with some target distribution. This is a promising tool to generate a codeword with a biased code-symbol distribution, which is required for capacity-achieving communication by asymmetric channels. It is known that asymptotically optimal homophonic coding can be realized by a Fixed-to-Variable (FV) length code using a… ▽ More

    Submitted 23 July, 2016; originally announced July 2016.

    Comments: 5 pages

  33. arXiv:1605.01677  [pdf, other

    stat.ML cs.LG

    Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

    Authors: Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

    Abstract: We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Relative Minimum Empirical Divergence (CW-… ▽ More

    Submitted 24 May, 2016; v1 submitted 5 May, 2016; originally announced May 2016.

    Comments: To appear in ICML2016

  34. arXiv:1509.09011  [pdf, ps, other

    stat.ML cs.LG

    Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

    Authors: Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

    Abstract: Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite ac… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.

    Comments: 24 pages, to appear in NIPS2015

  35. arXiv:1507.08733  [pdf, ps, other

    cs.IT

    Almost Instantaneous Fix-to-Variable Length Codes

    Authors: Hirosuke Yamamoto, Masato Tsuchihashi, Junya Honda

    Abstract: We propose almost instantaneous fixed-to-variable-length (AIFV) codes such that two (resp. $K-1$) code trees are used if code symbols are binary (resp. $K$-ary for $K \geq 3$), and source symbols are assigned to incomplete internal nodes in addition to leaves. Although the AIFV codes are not instantaneous codes, they are devised such that the decoding delay is at most two bits (resp. one code symb… ▽ More

    Submitted 30 July, 2015; originally announced July 2015.

    Comments: Submitted to the IEEE Transactions on Information Theory in October 2014, and revised in July 205

  36. arXiv:1506.03355  [pdf, ps, other

    cs.IT

    Exact Asymptotics for the Random Coding Error Probability

    Authors: Junya Honda

    Abstract: Error probabilities of random codes for memoryless channels are considered in this paper. In the area of communication systems, admissible error probability is very small and it is sometimes more important to discuss the relative gap between the achievable error probability and its bound than to discuss the absolute gap. Scarlett et al. derived a good upper bound of a random coding union bound bas… ▽ More

    Submitted 10 June, 2015; originally announced June 2015.

    Comments: Full version of the paper in ISIT2015 with some corrections and refinements

  37. arXiv:1506.02550  [pdf, ps, other

    stat.ML cs.LG

    Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

    Authors: Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa

    Abstract: We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed,… ▽ More

    Submitted 29 June, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: 26 pages, 10 figures, to appear in COLT2015 (ver.3: revised related work (RUCB))

  38. arXiv:1506.00779  [pdf, other

    stat.ML cs.LG

    Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

    Authors: Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

    Abstract: We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play T… ▽ More

    Submitted 20 March, 2019; v1 submitted 2 June, 2015; originally announced June 2015.

    Comments: Appeared in ICML2015. Fixed the evaluation of term (B) in Lemma 3. Replaced \tildeμ->θ

  39. arXiv:1504.05823  [pdf, other

    stat.ML cs.LG

    Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

    Authors: Wesley Cowan, Junya Honda, Michael N. Katehakis

    Abstract: Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled. It is assumed that for each fixed $i$, $\{ X^i_k \}_{k \geq 1}$ is a sequence of i.i.d. normal random variables, with unknown mean… ▽ More

    Submitted 2 June, 2015; v1 submitted 22 April, 2015; originally announced April 2015.

    Comments: 15 pages 3 figures