Search | arXiv e-print repository

Majorization-Minimization for sparse SVMs

Authors: Alessandro Benfenati, Emilie Chouzenoux, Giorgia Franchini, Salla Latva-Aijo, Dominik Narnhofer, Jean-Christophe Pesquet, Sebastian J. Scott, Mahsa Yousefi

Abstract: Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimizat… ▽ More Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost. △ Less

Submitted 31 August, 2023; originally announced August 2023.

arXiv:2306.09074 [pdf, other]

doi 10.1007/978-3-031-42753-4_5

Category Theory in Isabelle/HOL as a Basis for Meta-logical Investigation

Authors: Jonas Bayer, Aleksey Gonus, Christoph Benzmüller, Dana S. Scott

Abstract: This paper presents meta-logical investigations based on category theory using the proof assistant Isabelle/HOL. We demonstrate the potential of a free logic based shallow semantic embedding of category theory by providing a formalization of the notion of elementary topoi. Additionally, we formalize symmetrical monoidal closed categories expressing the denotational semantic model of intuitionistic… ▽ More This paper presents meta-logical investigations based on category theory using the proof assistant Isabelle/HOL. We demonstrate the potential of a free logic based shallow semantic embedding of category theory by providing a formalization of the notion of elementary topoi. Additionally, we formalize symmetrical monoidal closed categories expressing the denotational semantic model of intuitionistic multiplicative linear logic. Next to these meta-logical-investigations, we contribute to building an Isabelle category theory library, with a focus on ease of use in the formalization beyond category theory itself. This work paves the way for future formalizations based on category theory and demonstrates the power of automated reasoning in investigating meta-logical questions. △ Less

Submitted 16 June, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: 15 pages. Preprint of paper accepted for CICM 2023 conference

MSC Class: 68T15; 03B35; 03B80; 03B15; 08A05; 03C10; 03C68; 03C75; 20B05; 54H20 ACM Class: F.4; I.2.3

Journal ref: Intelligent Computer Mathematics (CICM 2023). Lecture Notes in Computer Science, vol 14101, pp. 69-83. Springer, Cham

arXiv:2305.18394 [pdf, other]

On Optimal Regularization Parameters via Bilevel Learning

Authors: Matthias J. Ehrhardt, Silvia Gazzola, Sebastian J. Scott

Abstract: Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Classical strategies used to de… ▽ More Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Classical strategies used to determine a suitable parameter value include the discrepancy principle and the L-curve criterion, and in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters and involves solving a nested optimization problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still an open question. In particular, a necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterizes positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and high-dimensional problems. △ Less

Submitted 22 January, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 34 pages, 11 figures. Version for publication

MSC Class: 65K10 (Primary) 65F22 (Secondary)

arXiv:2303.16974 [pdf, other]

BEVERS: A General, Simple, and Performant Framework for Automatic Fact Verification

Authors: Mitchell DeHaven, Stephen Scott

Abstract: Automatic fact verification has become an increasingly popular topic in recent years and among datasets the Fact Extraction and VERification (FEVER) dataset is one of the most popular. In this work we present BEVERS, a tuned baseline system for the FEVER dataset. Our pipeline uses standard approaches for document retrieval, sentence selection, and final claim classification, however, we spend cons… ▽ More Automatic fact verification has become an increasingly popular topic in recent years and among datasets the Fact Extraction and VERification (FEVER) dataset is one of the most popular. In this work we present BEVERS, a tuned baseline system for the FEVER dataset. Our pipeline uses standard approaches for document retrieval, sentence selection, and final claim classification, however, we spend considerable effort ensuring optimal performance for each component. The results are that BEVERS achieves the highest FEVER score and label accuracy among all systems, published or unpublished. We also apply this pipeline to another fact verification dataset, Scifact, and achieve the highest label accuracy among all systems on that dataset as well. We also make our full code available. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted to the Sixth FEVER Workshop at EACL 2023

arXiv:2302.03337 [pdf, other]

Datacenter Ethernet and RDMA: Issues at Hyperscale

Authors: Torsten Hoefler, Duncan Roweth, Keith Underwood, Bob Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyuan Shen, Abdul Kabbani, Moray McLaren, Steve Scott

Abstract: We observe that emerging artificial intelligence, high-performance computing, and storage workloads pose new challenges for large-scale datacenter networking. RDMA over Converged Ethernet (RoCE) was an attempt to adopt modern Remote Direct Memory Access (RDMA) features into existing Ethernet installations. Now, a decade later, we revisit RoCE's design points and conclude that several of its shortc… ▽ More We observe that emerging artificial intelligence, high-performance computing, and storage workloads pose new challenges for large-scale datacenter networking. RDMA over Converged Ethernet (RoCE) was an attempt to adopt modern Remote Direct Memory Access (RDMA) features into existing Ethernet installations. Now, a decade later, we revisit RoCE's design points and conclude that several of its shortcomings must be addressed to fulfill the demands of hyperscale datacenters. We predict that both the datacenter and high-performance computing markets will converge and adopt modernized Ethernet-based high-performance networking solutions that will replace TCP and RoCE within a decade. △ Less

Submitted 15 April, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: Published at IEEE Computer in 2023

arXiv:2209.03913 [pdf, other]

Data Management Challenges for Internet-scale 3D Search Engines

Authors: James Williams, Shane Scott, Sean Wedig, Timur Hindanov, Christoph Roedig

Abstract: This paper describes the most significant data-related challenges involved in building internet-scale 3D search engines. The discussion centers on the most pressing data management issues in this domain, including model acquisition, support for multiple file formats, asset versioning, data integrity errors, the data lifecycle, intellectual property, and the legality of web crawling. The paper also… ▽ More This paper describes the most significant data-related challenges involved in building internet-scale 3D search engines. The discussion centers on the most pressing data management issues in this domain, including model acquisition, support for multiple file formats, asset versioning, data integrity errors, the data lifecycle, intellectual property, and the legality of web crawling. The paper also discusses numerous issues that fall under the rubric of trustworthy computing, including privacy, security, inappropriate content, and copying/remixing of assets. The goal of the paper is to provide an overview of these general issues, illustrated by empirical data drawn from the internet's largest operational search engine. While numerous works have been published on 3D information retrieval, this paper is the first to discuss the real-world challenges that arise in building practical search engines at scale. △ Less

Submitted 26 December, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

Comments: Second version, distributed by SIGIR Forum

arXiv:2209.01346 [pdf, other]

HammingMesh: A Network Topology for Large-Scale Deep Learning

Authors: Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro, Steve Scott

Abstract: Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With the exhaustion of such optimizations, the growth of modern AI is now gated by the performance of training systems, especially their data movement. Instead of focusing on single accelerators, we investigate data-movement characteristics of large-scale t… ▽ More Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With the exhaustion of such optimizations, the growth of modern AI is now gated by the performance of training systems, especially their data movement. Instead of focusing on single accelerators, we investigate data-movement characteristics of large-scale training at full system scale. Based on our workload analysis, we design HammingMesh, a novel network topology that provides high bandwidth at low cost with high job scheduling flexibility. Specifically, HammingMesh can support full bandwidth and isolation to deep learning training jobs with two dimensions of parallelism. Furthermore, it also supports high global bandwidth for generic traffic. Thus, HammingMesh will power future large-scale deep learning systems with extreme bandwidth requirements. △ Less

Submitted 21 October, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

Comments: published at ACM/IEEE Supercomputing (SC22)

arXiv:1910.12863 [pdf, other]

doi 10.1007/978-3-030-43520-2_19

Computer-supported Exploration of a Categorical Axiomatization of Modeloids

Authors: Lucca Tiemens, Dana S. Scott, Christoph Benzmüller, Miroslav Benda

Abstract: A modeloid, a certain set of partial bijections, emerges from the idea to abstract from a structure to the set of its partial automorphisms. It comes with an operation, called the derivative, which is inspired by Ehrenfeucht-Fraïssé games. In this paper we develop a generalization of a modeloid first to an inverse semigroup and then to an inverse category using an axiomatic approach to category th… ▽ More A modeloid, a certain set of partial bijections, emerges from the idea to abstract from a structure to the set of its partial automorphisms. It comes with an operation, called the derivative, which is inspired by Ehrenfeucht-Fraïssé games. In this paper we develop a generalization of a modeloid first to an inverse semigroup and then to an inverse category using an axiomatic approach to category theory. We then show that this formulation enables a purely algebraic view on Ehrenfeucht-Fraïssé games. △ Less

Submitted 13 January, 2020; v1 submitted 27 October, 2019; originally announced October 2019.

Comments: 24 pages; accepted for conference: Relational and Algebraic Methods in Computer Science (RAMICS 2020)

MSC Class: 68T15; 03B35; 03B80; 03B15; 08A05; 03C10; 03C68; 03C75; 20B05; 54H20 ACM Class: F.4; I.2.3

arXiv:1910.01074 [pdf, other]

Formal Language Constraints for Markov Decision Processes

Authors: Eleanor Quint, Dong Xu, Samuel Flint, Stephen Scott, Matthew Dwyer

Abstract: In order to satisfy safety conditions, an agent may be constrained from acting freely. A safe controller can be designed a priori if an environment is well understood, but not when learning is employed. In particular, reinforcement learned (RL) controllers require exploration, which can be hazardous in safety critical situations. We study the benefits of giving structure to the constraints of a co… ▽ More In order to satisfy safety conditions, an agent may be constrained from acting freely. A safe controller can be designed a priori if an environment is well understood, but not when learning is employed. In particular, reinforcement learned (RL) controllers require exploration, which can be hazardous in safety critical situations. We study the benefits of giving structure to the constraints of a constrained Markov decision process by specifying them in formal languages as a step towards using safety methods from software engineering and controller synthesis. We instantiate these constraints as finite automata to efficiently recognise constraint violations. Constraint states are then used to augment the underlying MDP state and to learn a dense cost function, easing the problem of quickly learning joint MDP/constraint dynamics. We empirically evaluate the effect of these methods on training a variety of RL algorithms over several constraints specified in Safety Gym, MuJoCo, and Atari environments. △ Less

Submitted 13 October, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

Comments: NeurIPS 2019 Workshop on Safety and Robustness in Decision Making

arXiv:1810.07181 [pdf, other]

Deep-Waveform: A Learned OFDM Receiver Based on Deep Complex-valued Convolutional Networks

Authors: Zhongyuan Zhao, Mehmet C. Vuran, Fujuan Guo, Stephen D. Scott

Abstract: The (inverse) discrete Fourier transform (DFT/IDFT) is often perceived as essential to orthogonal frequency-division multiplexing (OFDM) systems. In this paper, a deep complex-valued convolutional network (DCCN) is developed to recover bits from time-domain OFDM signals without relying on any explicit DFT/IDFT. The DCCN can exploit the cyclic prefix (CP) of OFDM waveform for increased SNR by repla… ▽ More The (inverse) discrete Fourier transform (DFT/IDFT) is often perceived as essential to orthogonal frequency-division multiplexing (OFDM) systems. In this paper, a deep complex-valued convolutional network (DCCN) is developed to recover bits from time-domain OFDM signals without relying on any explicit DFT/IDFT. The DCCN can exploit the cyclic prefix (CP) of OFDM waveform for increased SNR by replacing DFT with a learned linear transform, and has the advantage of combining CP-exploitation, channel estimation, and intersymbol interference (ISI) mitigation, with a complexity of $\mathcal{O}(N^2)$. Numerical tests show that the DCCN receiver can outperform the legacy channel estimators based on ideal and approximate linear minimum mean square error (LMMSE) estimation and a conventional CP-enhanced technique in Rayleigh fading channels with various delay spreads and mobility. The proposed approach benefits from the expressive nature of complex-valued neural networks, which, however, currently lack support from popular deep learning platforms. In response, guidelines of exact and approximate implementations of a complex-valued convolutional layer are provided for the design and analysis of convolutional networks for wireless PHY. Furthermore, a suite of novel training techniques are developed to improve the convergence and generalizability of the trained model in fading channels. This work demonstrates the capability of deep neural networks in processing OFDM waveforms and the results suggest that the FFT processor in OFDM receivers can be replaced by a hardware AI accelerator. △ Less

Submitted 5 May, 2021; v1 submitted 16 October, 2018; originally announced October 2018.

Comments: 13 pages, 22 figures, accepted to IEEE Journal on Selected Areas in Communications

arXiv:1609.01493 [pdf, ps, other]

Axiomatizing Category Theory in Free Logic

Authors: Christoph Benzmüller, Dana S. Scott

Abstract: Starting from a generalization of the standard axioms for a monoid we present a stepwise development of various, mutually equivalent foundational axiom systems for category theory. Our axiom sets have been formalized in the Isabelle/HOL interactive proof assistant, and this formalization utilizes a semantically correct embedding of free logic in classical higher-order logic. The modeling and forma… ▽ More Starting from a generalization of the standard axioms for a monoid we present a stepwise development of various, mutually equivalent foundational axiom systems for category theory. Our axiom sets have been formalized in the Isabelle/HOL interactive proof assistant, and this formalization utilizes a semantically correct embedding of free logic in classical higher-order logic. The modeling and formal analysis of our axiom sets has been significantly supported by series of experiments with automated reasoning tools integrated with Isabelle/HOL. We also address the relation of our axiom systems to alternative proposals from the literature, including an axiom set proposed by Freyd and Scedrov for which we reveal a technical issue (when encoded in free logic where free variables range over defined and undefined objects): either all operations, e.g. morphism composition, are total or their axiom system is inconsistent. The repair for this problem is quite straightforward, however. △ Less

Submitted 12 October, 2018; v1 submitted 6 September, 2016; originally announced September 2016.

Comments: 17 pages

MSC Class: 68T15; 03B35; 03B80; 03B15 ACM Class: F.4; I.2.3

arXiv:1502.05818 [pdf, ps, other]

doi 10.1109/TWC.2015.2402671

Co-Primary Multi-Operator Resource Sharing for Small Cell Networks

Authors: Petri Luoto, Pekka Pirinen, Mehdi Bennis, Sumudu Samarakoon, Simon Scott, Matti Latva-aho

Abstract: To tackle the challenge of providing higher data rates within limited spectral resources we consider the case of multiple operators sharing a common pool of radio resources. Four algorithms are proposed to address co-primary multi-operator radio resource sharing under heterogeneous traffic in both centralized and distributed scenarios. The performance of these algorithms is assessed through extens… ▽ More To tackle the challenge of providing higher data rates within limited spectral resources we consider the case of multiple operators sharing a common pool of radio resources. Four algorithms are proposed to address co-primary multi-operator radio resource sharing under heterogeneous traffic in both centralized and distributed scenarios. The performance of these algorithms is assessed through extensive system-level simulations for two indoor small cell layouts. It is assumed that the spectral allocations of the small cells are orthogonal to the macro network layer and thus, only the small cell traffic is modeled. The main performance metrics are user throughput and the relative amount of shared spectral resources. The numerical results demonstrate the importance of coordination among co-primary operators for an optimal resource sharing. Also, maximizing the spectrum sharing percentage generally improves the achievable throughput gains over non-sharing. △ Less

Submitted 20 February, 2015; originally announced February 2015.

Showing 1–12 of 12 results for author: Scott, S