Skip to main content

Showing 1–13 of 13 results for author: Kassing, S

.
  1. arXiv:2402.03467  [pdf, ps, other

    cs.LG math.OC math.PR stat.ML

    Stochastic Modified Flows for Riemannian Stochastic Gradient Descent

    Authors: Benjamin Gess, Sebastian Kassing, Nimit Rana

    Abstract: We give quantitative estimates for the rate of convergence of Riemannian stochastic gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so-called Riemannian stochastic modified flow (RSMF). Using tools from stochastic differential geometry we show that, in the small learning rate regime, RSGD can be approximated by the solution to the RSMF driven by an infinite-dime… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    MSC Class: Primary 62L20; Secondary 58J65; 60J20; 65K05

  2. arXiv:2303.03950  [pdf, ps, other

    cs.LG math.NA math.OC stat.ML

    On the existence of optimal shallow feedforward networks with ReLU activation

    Authors: Steffen Dereich, Sebastian Kassing

    Abstract: We prove existence of global minima in the loss landscape for the approximation of continuous target functions using shallow feedforward artificial neural networks with ReLU activation. This property is one of the fundamental artifacts separating ReLU from other commonly used activation functions. We propose a kind of closure of the search space so that in the extended space minimizers exist. In a… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.14690

    MSC Class: Primary 68T07; Secondary 68T05; 41A50

  3. arXiv:2302.14690  [pdf, other

    math.OC cs.LG math.NA stat.ML

    On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

    Authors: Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

    Abstract: Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function. In practical relevant learning problems, it thus seems to be advisable to design the ANN architec… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    MSC Class: Primary 68T07; Secondary 68T05; 41A50

  4. arXiv:2302.07125  [pdf, ps, other

    math.PR cs.LG math.AP stat.ML

    Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic Gradient Descent

    Authors: Benjamin Gess, Sebastian Kassing, Vitalii Konarovskyi

    Abstract: We propose new limiting dynamics for stochastic gradient descent in the small learning rate regime called stochastic modified flows. These SDEs are driven by a cylindrical Brownian motion and improve the so-called stochastic modified equations by having regular diffusion coefficients and by matching the multi-point statistics. As a second contribution, we introduce distribution dependent stochasti… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 24 pages

    MSC Class: Primary 60J05; 60H15; 68T07; Secondary 60G46; 60G57; 46G05

  5. arXiv:2302.03550  [pdf, other

    math.OC cs.LG math.PR stat.ML

    Convergence rates for momentum stochastic gradient descent with noise of machine learning type

    Authors: Benjamin Gess, Sebastian Kassing

    Abstract: We consider the momentum stochastic gradient descent scheme (MSGD) and its continuous-in-time counterpart in the context of non-convex optimization. We show almost sure exponential convergence of the objective function value for target functions that are Lipschitz continuous and satisfy the Polyak-Lojasiewicz inequality on the relevant domain, and under assumptions on the stochastic noise that are… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    MSC Class: Primary 90C15; Secondary 68T07; 90C26; 62L20

  6. arXiv:2208.09519  [pdf, other

    cs.DB

    Resource Allocation in Serverless Query Processing

    Authors: Simon Kassing, Ingo Müller, Gustavo Alonso

    Abstract: Data lakes hold a growing amount of cold data that is infrequently accessed, yet require interactive response times. Serverless functions are seen as a way to address this use case since they offer an appealing alternative to maintaining (and paying for) a fixed infrastructure. Recent research has analyzed the potential of serverless for data processing. In this paper, we expand on such work by lo… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  7. arXiv:2208.08429  [pdf, other

    cs.NI

    New primitives for bounded degradation in network service

    Authors: Simon Kassing, Vojislav Dukic, Ce Zhang, Ankit Singla

    Abstract: Certain new ascendant data center workloads can absorb some degradation in network service, not needing fully reliable data transport and/or their fair-share of network bandwidth. This opens up opportunities for superior network and infrastructure multiplexing by having this flexible traffic cede capacity under congestion to regular traffic with stricter needs. We posit there is opportunity in net… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    ACM Class: C.2.2

  8. arXiv:2108.05643  [pdf, other

    cs.LG math.ST

    On minimal representations of shallow ReLU networks

    Authors: S. Dereich, S. Kassing

    Abstract: The realization function of a shallow ReLU network is a continuous and piecewise affine function $f:\mathbb R^d\to \mathbb R$, where the domain $\mathbb R^{d}$ is partitioned by a set of $n$ hyperplanes into cells on which $f$ is affine. We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons and we characterize each of the three cases. In the particular case, where… ▽ More

    Submitted 12 August, 2021; originally announced August 2021.

    Comments: 16 pages

    MSC Class: Primary 68T05; Secondary 68T07; 26B40

  9. Cooling down stochastic differential equations: almost sure convergence

    Authors: S. Dereich, S. Kassing

    Abstract: We consider almost sure convergence of the SDE $dX_t=α_t d t + β_t d W_t$ under the existence of a $C^2$-Lyapunov function $F:\mathbb R^d \to \mathbb R$. More explicitly, we show that on the event that the process stays local we have almost sure convergence in the Lyapunov function $(F(X_t))$ as well as $\nabla F(X_t)\to 0$, if $|β_t|=\mathcal O( t^{-β})$ for a $β>1/2$. If, additionally, one assum… ▽ More

    Submitted 15 July, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    MSC Class: Primary 60J60; Secondary 60H10; 65C35

    Journal ref: Stochastic Processes and their Applications, Volume 152, 2022, Pages 289-311

  10. arXiv:2103.16437  [pdf, other

    cs.CR cs.NI

    Order P4-66: Characterizing and mitigating surreptitious programmable network device exploitation

    Authors: Simon Kassing, Hussain Abbas, Laurent Vanbever, Ankit Singla

    Abstract: Substantial efforts are invested in improving network security, but the threat landscape is rapidly evolving, particularly with the recent interest in programmable network hardware. We explore a new security threat, from an attacker who has gained control of such devices. While it should be obvious that such attackers can trivially cause substantial damage, the challenge and novelty are in doing s… ▽ More

    Submitted 27 May, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: 14 pages, 13 figures, 1 table

    ACM Class: C.2.3

  11. arXiv:2102.09385  [pdf, ps, other

    cs.LG math.PR math.ST

    Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

    Authors: Steffen Dereich, Sebastian Kassing

    Abstract: In this article, we consider convergence of stochastic gradient descent schemes (SGD), including momentum stochastic gradient descent (MSGD), under weak assumptions on the underlying landscape. More explicitly, we show that on the event that the SGD stays bounded we have convergence of the SGD if there is only a countable number of critical points or if the objective function satisfies Lojasiewicz… ▽ More

    Submitted 9 January, 2024; v1 submitted 16 February, 2021; originally announced February 2021.

    MSC Class: 62L20 (Primary) 60J05; 60J20; 65C05 (Secondary)

  12. arXiv:1912.09187  [pdf, ps, other

    math.PR math.ST

    Central limit theorems for stochastic gradient descent with averaging for stable manifolds

    Authors: Steffen Dereich, Sebastian Kassing

    Abstract: In this article we establish new central limit theorems for Ruppert-Polyak averaged stochastic gradient descent schemes. Compared to previous work we do not assume that convergence occurs to an isolated attractor but instead allow convergence to a stable manifold. On the stable manifold the target function is constant and the oscillations in the tangential direction may be significantly larger tha… ▽ More

    Submitted 19 December, 2019; originally announced December 2019.

    Comments: 42 pages

    MSC Class: Primary 62L20; Secondary 60J05; 65C05

  13. arXiv:1810.07766  [pdf, other

    cs.DC cs.LG

    Distributed Learning over Unreliable Networks

    Authors: Chen Yu, Hanlin Tang, Cedric Renggli, Simon Kassing, Ankit Singla, Dan Alistarh, Ce Zhang, Ji Liu

    Abstract: Most of today's distributed machine learning systems assume {\em reliable networks}: whenever two machines exchange information (e.g., gradients or models), the network should guarantee the delivery of the message. At the same time, recent work exhibits the impressive tolerance of machine learning algorithms to errors or noise arising from relaxed communication or synchronization. In this paper, w… ▽ More

    Submitted 15 May, 2019; v1 submitted 17 October, 2018; originally announced October 2018.