Skip to main content

Showing 1–13 of 13 results for author: Wang, K A

.
  1. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2312.03344  [pdf, other

    cs.LG math.DS stat.AP stat.ML

    Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild

    Authors: Ke Alexander Wang, Emily B. Fox

    Abstract: Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Proceedings of Machine Learning for Health (ML4H) 2023. Code available at: https://github.com/KeAWang/interpretable-cgm-representations

  3. arXiv:2305.01638  [pdf, other

    cs.LG cs.CV stat.ML

    Sequence Modeling with Multiresolution Convolutional Memory

    Authors: Jiaxin Shi, Ke Alexander Wang, Emily B. Fox

    Abstract: Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n… ▽ More

    Submitted 1 November, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: ICML 2023, Source code: https://github.com/thjashin/multires-conv

  4. arXiv:2304.14300  [pdf, other

    cs.LG math.DS q-bio.QM

    Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates

    Authors: Ke Alexander Wang, Matthew E. Levine, Jiaxin Shi, Emily B. Fox

    Abstract: Traditional models of glucose-insulin dynamics rely on heuristic parameterizations chosen to fit observations within a laboratory setting. However, these models cannot describe glucose dynamics in daily life. One source of failure is in their descriptions of glucose absorption rates after meal events. A meal's macronutritional content has nuanced effects on the absorption profile, which is difficu… ▽ More

    Submitted 27 April, 2023; originally announced April 2023.

    Comments: Work presented at NeurIPS 2022 Workshop on Learning from Time Series for Health (TS4H). arXiv admin note: substantial text overlap with arXiv:2302.11939

  5. arXiv:2304.13138  [pdf, other

    cs.AI cs.LG

    The Update-Equivalence Framework for Decision-Time Planning

    Authors: Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

    Abstract: The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes gr… ▽ More

    Submitted 13 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

  6. arXiv:2212.06027  [pdf, other

    cs.GT cs.AI cs.MA econ.TH

    Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games

    Authors: Sam Ganzfried, Kevin A. Wang, Max Chiswick

    Abstract: In many real-world settings agents engage in strategic interactions with multiple opposing agents who can employ a wide variety of strategies. The standard approach for designing agents for such settings is to compute or approximate a relevant game-theoretic solution concept such as Nash equilibrium and then follow the prescribed strategy. However, such a strategy ignores any observations of oppon… ▽ More

    Submitted 20 May, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

  7. arXiv:2112.12986  [pdf, other

    cs.LG stat.ML

    Is Importance Weighting Incompatible with Interpolating Classifiers?

    Authors: Ke Alexander Wang, Niladri S. Chatterji, Saminul Haque, Tatsunori Hashimoto

    Abstract: Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh… ▽ More

    Submitted 4 March, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: International Conference on Learning Representations (ICLR), 2022

  8. arXiv:2112.09964  [pdf, other

    cs.LG math.DS

    GOPHER: Categorical probabilistic forecasting with graph structure via local continuous-time dynamics

    Authors: Ke Alexander Wang, Danielle Maddix, Yuyang Wang

    Abstract: We consider the problem of probabilistic forecasting over categories with graph structure, where the dynamics at a vertex depends on its local connectivity structure. We present GOPHER, a method that combines the inductive bias of graph neural networks with neural ODEs to capture the intrinsic local continuous-time dynamics of our probabilistic forecasts. We study the benefits of these two inducti… ▽ More

    Submitted 18 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021 Workshop ICBINB Spotlight

  9. arXiv:2106.06695  [pdf, other

    cs.LG stat.ML

    SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

    Authors: Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice us… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  10. arXiv:2104.09460  [pdf, other

    stat.ML cs.AI cs.IT cs.LG cs.NE

    Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information

    Authors: Willie Neiswanger, Ke Alexander Wang, Stefano Ermon

    Abstract: In many real-world problems, we want to infer some property of an expensive black-box function $f$, given a budget of $T$ function evaluations. One example is budget constrained global optimization of $f$, for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by $f$. Often, we can find… ▽ More

    Submitted 6 July, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Appears in Proceedings of the 38th International Conference on Machine Learning (ICML), 2021

  11. arXiv:2010.13581  [pdf, other

    cs.LG math.DS physics.comp-ph physics.data-an stat.ML

    Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

    Authors: Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show th… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/mfinzi/constrained-hamiltonian-neural-networks

  12. arXiv:1911.06944  [pdf, other

    cs.LG cs.DC stat.CO stat.ME stat.ML

    $DC^2$: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

    Authors: Ke Alexander Wang, Xinran Bian, Pan Liu, Donghui Yan

    Abstract: Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random sampling which may be suboptimal. To address these concerns, we propose the $DC^2$ algorithm. Instead of ensemble instances, we produce structure-preserving sig… ▽ More

    Submitted 15 November, 2019; originally announced November 2019.

  13. arXiv:1903.08114  [pdf, other

    cs.LG cs.DC stat.ML

    Exact Gaussian Processes on a Million Data Points

    Authors: Ke Alexander Wang, Geoff Pleiss, Jacob R. Gardner, Stephen Tyree, Kilian Q. Weinberger, Andrew Gordon Wilson

    Abstract: Gaussian processes (GPs) are flexible non-parametric models, with a capacity that grows with the available data. However, computational constraints with standard inference procedures have limited exact GPs to problems with fewer than about ten thousand training points, necessitating approximations for larger datasets. In this paper, we develop a scalable approach for exact GPs that leverages multi… ▽ More

    Submitted 10 December, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

    Comments: Published at NeurIPS 2019