-
Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression
Authors:
Li Wan,
Tansu Alpcan,
Margreta Kuijper,
Emanuele Viterbo
Abstract:
We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, opti…
▽ More
We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six benchmark text datasets, our algorithm competes closely with top models, especially in limited-vocabulary contexts, using significantly fewer parameters. \review{Our algorithm closely matches top-performing models, deviating by only ~2\% on limited-vocabulary datasets, using just 10\% of their parameters. However, it falls short on diverse-vocabulary datasets, likely due to the LZW algorithm's constraints with low-repetition data. This contrast highlights its efficiency and limitations across different dataset types.
△ Less
Submitted 28 April, 2024;
originally announced May 2024.
-
An Unknown Input Multi-Observer Approach for Estimation and Control under Adversarial Attacks
Authors:
Tianci Yang,
Carlos Murguia,
Margreta Kuijper,
Dragan Nesic
Abstract:
We address the problem of state estimation, attack isolation, and control of discrete-time linear time-invariant systems under (potentially unbounded) actuator and sensor false data injection attacks. Using a bank of unknown input observers, each observer leading to an exponentially stable estimation error (in the attack-free case), we propose an observer-based estimator that provides exponential…
▽ More
We address the problem of state estimation, attack isolation, and control of discrete-time linear time-invariant systems under (potentially unbounded) actuator and sensor false data injection attacks. Using a bank of unknown input observers, each observer leading to an exponentially stable estimation error (in the attack-free case), we propose an observer-based estimator that provides exponential estimates of the system state in spite of actuator and sensor attacks. Exploiting sensor and actuator redundancy, the estimation scheme is guaranteed to work if a sufficiently small subset of sensors and actuators are under attack. Using the proposed estimator, we provide tools for reconstructing and isolating actuator and sensor attacks; and a control scheme capable of stabilizing the closed-loop dynamics by switching off isolated actuators. Simulation results are presented to illustrate the performance of our tools.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
A Multi-Observer Based Estimation Framework for Nonlinear Systems under Sensor Attacks
Authors:
Tianci Yang,
Carlos Murguia,
Margreta Kuijper,
Dragan Nesic
Abstract:
We address the problem of state estimation and attack isolation for general discrete-time nonlinear systems when sensors are corrupted by (potentially unbounded) attack signals. For a large class of nonlinear plants and observers, we provide a general estimation scheme, built around the idea of sensor redundancy and multi-observer, capable of reconstructing the system state in spite of sensor atta…
▽ More
We address the problem of state estimation and attack isolation for general discrete-time nonlinear systems when sensors are corrupted by (potentially unbounded) attack signals. For a large class of nonlinear plants and observers, we provide a general estimation scheme, built around the idea of sensor redundancy and multi-observer, capable of reconstructing the system state in spite of sensor attacks and noise. This scheme has been proposed by others for linear systems/observers and here we propose a unifying framework for a much larger class of nonlinear systems/observers. Using the proposed estimator, we provide an isolation algorithm to pinpoint attacks on sensors during sliding time windows. Simulation results are presented to illustrate the performance of our tools.
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
An Unknown Input Multi-Observer Approach for Estimation, Attack Isolation, and Control of LTI Systems under Actuator Attacks
Authors:
Tianci Yang,
Carlos Murguia,
Margreta Kuijper,
Dragan Nesic
Abstract:
We address the problem of state estimation, attack isolation, and control for discrete-time Linear Time Invariant (LTI) systems under (potentially unbounded) actuator false data injection attacks. Using a bank of Unknown Input Observers (UIOs), each observer leading to an exponentially stable estimation error in the attack-free case, we propose an estimator that provides exponential estimates of t…
▽ More
We address the problem of state estimation, attack isolation, and control for discrete-time Linear Time Invariant (LTI) systems under (potentially unbounded) actuator false data injection attacks. Using a bank of Unknown Input Observers (UIOs), each observer leading to an exponentially stable estimation error in the attack-free case, we propose an estimator that provides exponential estimates of the system state and the attack signals when a sufficiently small number of actuators are attacked. We use these estimates to control the system and isolate actuator attacks. Simulations results are presented to illustrate the performance of the results.
△ Less
Submitted 25 November, 2018;
originally announced November 2018.
-
A Multi-Observer Approach for Attack Detection and Isolation of Discrete-Time Nonlinear Systems
Authors:
Tianci Yang,
Carlos Murguia,
Margreta Kuijper,
Dragan Nešić
Abstract:
We address the problem of attack detection and isolation for a class of discrete-time nonlinear systems under (potentially unbounded) sensor attacks and measurement noise. We consider the case when a subset of sensors is subject to additive false data injection attacks. Using a bank of observers, each observer leading to an Input-to-State Stable (ISS) estimation error, we propose two algorithms fo…
▽ More
We address the problem of attack detection and isolation for a class of discrete-time nonlinear systems under (potentially unbounded) sensor attacks and measurement noise. We consider the case when a subset of sensors is subject to additive false data injection attacks. Using a bank of observers, each observer leading to an Input-to-State Stable (ISS) estimation error, we propose two algorithms for detecting and isolating sensor attacks. These algorithms make use of the ISS property of the observers to check whether the trajectories of observers are `consistent' with the attack-free trajectories of the system. Simulations results are presented to illustrate the performance of the proposed algorithms.
△ Less
Submitted 2 January, 2019; v1 submitted 17 June, 2018;
originally announced June 2018.
-
UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish
Authors:
Marloes Kuijper,
Mike van Lenthe,
Rik van Noord
Abstract:
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular…
▽ More
The present study describes our submission to SemEval 2018 Task 1: Affect in Tweets. Our Spanish-only approach aimed to demonstrate that it is beneficial to automatically generate additional training data by (i) translating training data from other languages and (ii) applying a semi-supervised learning method. We find strong support for both approaches, with those models outperforming our regular models in all subtasks. However, creating a stepwise ensemble of different models as opposed to simply averaging did not result in an increase in performance. We placed second (EI-Reg), second (EI-Oc), fourth (V-Reg) and fifth (V-Oc) in the four Spanish subtasks we participated in.
△ Less
Submitted 28 May, 2018;
originally announced May 2018.
-
A Robust Circle-criterion Observer-based Estimator for Discrete-time Nonlinear Systems in the Presence of Sensor Attacks and Measurement Noise
Authors:
Tianci Yang,
Carlos Murguia,
Margreta Kuijper,
Dragan Nešić
Abstract:
We address the problem of robust state estimation of a class of discrete-time nonlinear systems with positive-slope nonlinearities when the sensors are corrupted by (potentially unbounded) attack signals and bounded measurement noise. We propose an observer-based estimator, using a bank of circle-criterion observers, which provides a robust estimate of the system state in spite of sensor attacks a…
▽ More
We address the problem of robust state estimation of a class of discrete-time nonlinear systems with positive-slope nonlinearities when the sensors are corrupted by (potentially unbounded) attack signals and bounded measurement noise. We propose an observer-based estimator, using a bank of circle-criterion observers, which provides a robust estimate of the system state in spite of sensor attacks and measurement noise. We first consider the attack-free case where there is measurement noise and we provide a design method for a robust circle-criterion observer. Then, we consider the case when a sufficiently small subset of sensors are subject to attacks and all sensors are affected by measurement noise. We use our robust circle-criterion observer as the main ingredient in building an estimator that provides robust state estimation in this case. Finally, we propose an algorithm for isolating attacked sensors in the case of bounded measurement noise. We test this algorithm through simulations.
△ Less
Submitted 19 September, 2018; v1 submitted 11 May, 2018;
originally announced May 2018.
-
Linear system security -- detection and correction of adversarial attacks in the noise-free case
Authors:
Zhanghan Tang,
Margreta Kuijper,
Michelle Chong,
Iven Mareels,
Chris Leckie
Abstract:
We address the problem of attack detection and attack correction for multi-output discrete-time linear time-invariant systems under sensor attack. More specifically, we focus on the situation where adversarial attack signals are added to some of the system's output signals. A 'security index' is defined to characterize the vulnerability of a system against such sensor attacks. Methods to compute t…
▽ More
We address the problem of attack detection and attack correction for multi-output discrete-time linear time-invariant systems under sensor attack. More specifically, we focus on the situation where adversarial attack signals are added to some of the system's output signals. A 'security index' is defined to characterize the vulnerability of a system against such sensor attacks. Methods to compute the security index are presented as are algorithms to detect and correct for sensor attacks. The results are illustrated by examples involving multiple sensors.
△ Less
Submitted 14 November, 2017;
originally announced November 2017.
-
Vulnerability of linear systems against sensor attacks--a system's security index
Authors:
Michelle S. Chong,
Margreta Kuijper
Abstract:
The `security index' of a discrete-time LTI system under sensor attacks is introduced as a quantitative measure on the security of an observable system. We derive ideas from error control coding theory to provide sufficient conditions for attack detection and correction.
The `security index' of a discrete-time LTI system under sensor attacks is introduced as a quantitative measure on the security of an observable system. We derive ideas from error control coding theory to provide sufficient conditions for attack detection and correction.
△ Less
Submitted 21 February, 2016;
originally announced February 2016.
-
Gabidulin Decoding via Minimal Bases of Linearized Polynomial Modules
Authors:
Anna-Lena Trautmann,
Margreta Kuijper
Abstract:
We show how Gabidulin codes can be decoded via parametrization by using interpolation modules over the ring of linearized polynomials with composition. Our decoding algorithm computes a list of message words that correspond to all closest codewords to a given received word. This involves the computation of a minimal basis for the interpolation module that corresponds to the received word, followed…
▽ More
We show how Gabidulin codes can be decoded via parametrization by using interpolation modules over the ring of linearized polynomials with composition. Our decoding algorithm computes a list of message words that correspond to all closest codewords to a given received word. This involves the computation of a minimal basis for the interpolation module that corresponds to the received word, followed by a search through the parametrization for valid message words. Our module-theoretic approach strengthens the link between Gabidulin decoding and Reed-Solomon decoding. Two subalgorithms are presented to compute the minimal basis, one iterative, the other an extended Euclidean algorithm. Both of these subalgorithms have polynomial time complexity. The complexity order of the overall algorithm, using the parametrization, is then compared to straightforward exhaustive search as well as to chase list decoding.
△ Less
Submitted 1 September, 2015; v1 submitted 11 August, 2014;
originally announced August 2014.
-
Gröbner Bases for Linearized Polynomials
Authors:
Margreta Kuijper,
Anna-Lena Trautmann
Abstract:
In this work we develop the theory of Gröbner bases for modules over the ring of univariate linearized polynomials with coefficients from a finite field.
In this work we develop the theory of Gröbner bases for modules over the ring of univariate linearized polynomials with coefficients from a finite field.
△ Less
Submitted 18 June, 2014;
originally announced June 2014.
-
Iterative List-Decoding of Gabidulin Codes via Gröbner Based Interpolation
Authors:
Margreta Kuijper,
Anna-Lena Trautmann
Abstract:
We show how Gabidulin codes can be list decoded by using an iterative parametrization approach. For a given received word, our decoding algorithm processes its entries one by one, constructing four polynomials at each step. This then yields a parametrization of interpolating solutions for the data so far. From the final result a list of all codewords that are closest to the received word with resp…
▽ More
We show how Gabidulin codes can be list decoded by using an iterative parametrization approach. For a given received word, our decoding algorithm processes its entries one by one, constructing four polynomials at each step. This then yields a parametrization of interpolating solutions for the data so far. From the final result a list of all codewords that are closest to the received word with respect to the rank metric is obtained.
△ Less
Submitted 28 May, 2014;
originally announced May 2014.
-
List-Decoding Gabidulin Codes via Interpolation and the Euclidean Algorithm
Authors:
Margreta Kuijper,
Anna-Lena Trautmann
Abstract:
We show how Gabidulin codes can be list decoded by using a parametrization approach. For this we consider a certain module in the ring of linearized polynomials and find a minimal basis for this module using the Euclidean algorithm with respect to composition of polynomials. For a given received word, our decoding algorithm computes a list of all codewords that are closest to the received word wit…
▽ More
We show how Gabidulin codes can be list decoded by using a parametrization approach. For this we consider a certain module in the ring of linearized polynomials and find a minimal basis for this module using the Euclidean algorithm with respect to composition of polynomials. For a given received word, our decoding algorithm computes a list of all codewords that are closest to the received word with respect to the rank metric.
△ Less
Submitted 23 April, 2014;
originally announced April 2014.
-
Erasure codes with simplex locality
Authors:
Margreta Kuijper,
Diego Napp
Abstract:
We focus on erasure codes for distributed storage. The distributed storage setting imposes locality requirements because of easy repair demands on the decoder. We first establish the characterization of various locality properties in terms of the generator matrix of the code. These lead to bounds on locality and notions of optimality. We then examine the locality properties of a family of non-bina…
▽ More
We focus on erasure codes for distributed storage. The distributed storage setting imposes locality requirements because of easy repair demands on the decoder. We first establish the characterization of various locality properties in terms of the generator matrix of the code. These lead to bounds on locality and notions of optimality. We then examine the locality properties of a family of non-binary codes with simplex structure. We investigate their optimality and design several easy repair decoding methods. In particular, we show that any correctable erasure pattern can be solved by easy repair.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
An iterative algorithm for parametrization of shortest length shift registers over finite rings
Authors:
M. Kuijper,
R. Pinto
Abstract:
The construction of shortest feedback shift registers for a finite sequence S_1,...,S_N is considered over the finite ring Z_{p^r}. A novel algorithm is presented that yields a parametrization of all shortest feedback shift registers for the sequence of numbers S_1,...,S_N, thus solving an open problem in the literature. The algorithm iteratively processes each number, starting with S_1, and const…
▽ More
The construction of shortest feedback shift registers for a finite sequence S_1,...,S_N is considered over the finite ring Z_{p^r}. A novel algorithm is presented that yields a parametrization of all shortest feedback shift registers for the sequence of numbers S_1,...,S_N, thus solving an open problem in the literature. The algorithm iteratively processes each number, starting with S_1, and constructs at each step a particular type of minimal Gröbner basis. The construction involves a simple update rule at each step which leads to computational efficiency. It is shown that the algorithm simultaneously computes a similar parametrization for the reciprocal sequence S_N,...,S_1.
△ Less
Submitted 27 January, 2012;
originally announced January 2012.
-
An algebraic approach to source coding with side information using list decoding
Authors:
Mortuza Ali,
Margreta Kuijper
Abstract:
Existing literature on source coding with side information (SCSI) mostly uses the state-of-the-art channel codes namely LDPC codes, turbo codes, and their variants and assume classical unique decoding. In this paper, we present an algebraic approach to SCSI based on the list decoding of the underlying channel codes. We show that the theoretical limit of SCSI can be achieved in the proposed list de…
▽ More
Existing literature on source coding with side information (SCSI) mostly uses the state-of-the-art channel codes namely LDPC codes, turbo codes, and their variants and assume classical unique decoding. In this paper, we present an algebraic approach to SCSI based on the list decoding of the underlying channel codes. We show that the theoretical limit of SCSI can be achieved in the proposed list decoding based framework when the correlation between the source and side information is $q$-ary symmetric. We argue that, as opposed to channel coding, the correct sequence from the list produced by the list decoder can effectively be recovered in case of SCSI with a few CRC symbols. The CRC symbols, which allow the decoder to identify the correct sequence, incur negligible overhead for large block lengths. More importantly, these CRC symbols are not subject to noise since we are dealing with a virtual noisy channel rather than a real noisy channel. Finally, we present a guideline for designing constructive SCSI schemes for non-binary and binary sources using Reed Solomon codes and BCH codes, respectively. This guideline allows us to design a SCSI scheme for any arbitrary $q$-ary symmetric correlation without resorting to simulation.
△ Less
Submitted 31 October, 2011;
originally announced October 2011.
-
A parametric approach to list decoding of Reed-Solomon codes using interpolation
Authors:
Mortuza Ali,
Margreta Kuijper
Abstract:
In this paper we present a minimal list decoding algorithm for Reed-Solomon (RS) codes. Minimal list decoding for a code $C$ refers to list decoding with radius $L$, where $L$ is the minimum of the distances between the received word $\mathbf{r}$ and any codeword in $C$. We consider the problem of determining the value of $L$ as well as determining all the codewords at distance $L$. Our approach i…
▽ More
In this paper we present a minimal list decoding algorithm for Reed-Solomon (RS) codes. Minimal list decoding for a code $C$ refers to list decoding with radius $L$, where $L$ is the minimum of the distances between the received word $\mathbf{r}$ and any codeword in $C$. We consider the problem of determining the value of $L$ as well as determining all the codewords at distance $L$. Our approach involves a parametrization of interpolating polynomials of a minimal Gröbner basis $G$. We present two efficient ways to compute $G$. We also show that so-called re-encoding can be used to further reduce the complexity. We then demonstrate how our parametric approach can be solved by a computationally feasible rational curve fitting solution from a recent paper by Wu. Besides, we present an algorithm to compute the minimum multiplicity as well as the optimal values of the parameters associated with this multiplicity which results in overall savings in both memory and computation.
△ Less
Submitted 13 April, 2011; v1 submitted 3 November, 2010;
originally announced November 2010.
-
Source Coding With Side Information Using List Decoding
Authors:
Mortuza Ali,
Margreta Kuijper
Abstract:
The problem of source coding with side information (SCSI) is closely related to channel coding. Therefore, existing literature focuses on using the most successful channel codes namely, LDPC codes, turbo codes, and their variants, to solve this problem assuming classical unique decoding of the underlying channel code. In this paper, in contrast to classical decoding, we have taken the list decod…
▽ More
The problem of source coding with side information (SCSI) is closely related to channel coding. Therefore, existing literature focuses on using the most successful channel codes namely, LDPC codes, turbo codes, and their variants, to solve this problem assuming classical unique decoding of the underlying channel code. In this paper, in contrast to classical decoding, we have taken the list decoding approach. We show that syndrome source coding using list decoding can achieve the theoretical limit. We argue that, as opposed to channel coding, the correct sequence from the list produced by the list decoder can effectively be recovered in case of SCSI, since we are dealing with a virtual noisy channel rather than a real noisy channel. Finally, we present a guideline for designing constructive SCSI schemes using Reed Solomon code, BCH code, and Reed-Muller code, which are the known list-decodable codes.
△ Less
Submitted 15 January, 2010;
originally announced January 2010.
-
Minimal Gröbner bases and the predictable leading monomial property
Authors:
M. Kuijper,
K. Schindelar
Abstract:
We focus on Gröbner bases for modules of univariate polynomial vectors over a ring. We identify a useful property, the "predictable leading monomial (PLM) property" that is shared by minimal Gröbner bases of modules in F[x]^q, no matter what positional term order is used. The PLM property is useful in a range of applications and can be seen as a strengthening of the wellknown predictable degree pr…
▽ More
We focus on Gröbner bases for modules of univariate polynomial vectors over a ring. We identify a useful property, the "predictable leading monomial (PLM) property" that is shared by minimal Gröbner bases of modules in F[x]^q, no matter what positional term order is used. The PLM property is useful in a range of applications and can be seen as a strengthening of the wellknown predictable degree property (= row reducedness), a terminology introduced by Forney in the 70's. Because of the presence of zero divisors, minimal Gröbner bases over a finite ring of the type Z_p^r (where p is a prime integer and r is an integer >1) do not necessarily have the PLM property. In this paper we show how to derive, from an ordered minimal Gröbner basis, a so-called "minimal Gröbner p-basis" that does have a PLM property. We demonstrate that minimal Gröbner p-bases lend themselves particularly well to derive minimal realization parametrizations over Z_p^r. Applications are in coding and sequences over Z_p^r.
△ Less
Submitted 26 May, 2010; v1 submitted 25 June, 2009;
originally announced June 2009.
-
On minimality of convolutional ring encoders
Authors:
Margreta Kuijper,
Raquel Pinto
Abstract:
Convolutional codes are considered with code sequences modelled as semi-infinite Laurent series. It is wellknown that a convolutional code C over a finite group G has a minimal trellis representation that can be derived from code sequences. It is also wellknown that, for the case that G is a finite field, any polynomial encoder of C can be algebraically manipulated to yield a minimal polynomial…
▽ More
Convolutional codes are considered with code sequences modelled as semi-infinite Laurent series. It is wellknown that a convolutional code C over a finite group G has a minimal trellis representation that can be derived from code sequences. It is also wellknown that, for the case that G is a finite field, any polynomial encoder of C can be algebraically manipulated to yield a minimal polynomial encoder whose controller canonical realization is a minimal trellis. In this paper we seek to extend this result to the finite ring case G = Z_{p^r} by introducing a socalled "p-encoder". We show how to manipulate a polynomial encoding of a noncatastrophic convolutional code over Z_{p^r} to produce a particular type of p-encoder ("minimal p-encoder") whose controller canonical realization is a minimal trellis with nonlinear features. The minimum number of trellis states is then expressed as p^gamma, where gamma is the sum of the row degrees of the minimal p-encoder. In particular, we show that any convolutional code over Z_{p^r} admits a delay-free p-encoder which implies the novel result that delay-freeness is not a property of the code but of the encoder, just as in the field case. We conjecture that a similar result holds with respect to catastrophicity, i.e., any catastrophic convolutional code over Z_{p^r} admits a noncatastrophic p-encoder.
△ Less
Submitted 14 April, 2009; v1 submitted 24 January, 2008;
originally announced January 2008.