-
MPOGames: Efficient Multimodal Partially Observable Dynamic Games
Authors:
Oswin So,
Paul Drews,
Thomas Balch,
Velin Dimitrov,
Guy Rosman,
Evangelos A. Theodorou
Abstract:
Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEn…
▽ More
Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEnt Nash equilibria using linear-quadratic approximations which are restricted to unimodal responses and unsuitable for scenarios with multiple local Nash equilibria. By reformulating the problem as a POMDP, we propose MPOGames, a method for efficiently solving MaxEnt dynamic games that captures the interactions between local Nash equilibria. We show the importance of uncertainty-aware game theoretic methods via a two-agent merge case study. Finally, we prove the real-time capabilities of our approach with hardware experiments on a 1/10th scale car platform.
△ Less
Submitted 23 May, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Teaching Autonomous Systems Hands-On: Leveraging Modular Small-Scale Hardware in the Robotics Classroom
Authors:
Johannes Betz,
Hongrui Zheng,
Zirui Zang,
Florian Sauerbeck,
Krzysztof Walas,
Velin Dimitrov,
Madhur Behl,
Rosa Zheng,
Joydeep Biswas,
Venkat Krovi,
Rahul Mangharam
Abstract:
Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in develo**, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomo…
▽ More
Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in develo**, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomous vehicle hardware platform, called F1TENTH, is developed for teaching autonomous systems hands-on. This article describes the teaching modules and software stack for teaching at various educational levels with the theme of "racing" and competitions that replace exams. The F1TENTH vehicles offer a modular hardware platform and its related software for teaching the fundamentals of autonomous driving algorithms. From basic reactive methods to advanced planning algorithms, the teaching modules enhance students' computational thinking through autonomous driving with the F1TENTH vehicle. The F1TENTH car fills the gap between research platforms and low-end toy cars and offers hands-on experience in learning the topics in autonomous systems. Four universities have adopted the teaching modules for their semester-long undergraduate and graduate courses for multiple years. Student feedback is used to analyze the effectiveness of the F1TENTH platform. More than 80% of the students strongly agree that the hardware platform and modules greatly motivate their learning, and more than 70% of the students strongly agree that the hardware-enhanced their understanding of the subjects. The survey results show that more than 80% of the students strongly agree that the competitions motivate them for the course.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Low-Complexity Loeffler DCT Approximations for Image and Video Coding
Authors:
D. F. G. Coelho,
R. J. Cintra,
F. M. Bayer,
S. Kulasekera,
A. Madanayake,
P. A. C. Martinez,
T. L. T. Silveira,
R. S. Oliveira,
V. S. Dimitrov
Abstract:
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where…
▽ More
This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where computational complexity, proximity, and coding performance are considered. Efficient approximations and their scaled 16- and 32-point versions are embedded into image and video encoders, including a JPEG-like codec and H.264/AVC and H.265/HEVC standards. Results are compared to the unmodified standard codecs. Efficient approximations are mapped and implemented on a Xilinx VLX240T FPGA and evaluated for area, speed, and power consumption.
△ Less
Submitted 28 July, 2022;
originally announced July 2022.
-
Block-Parallel Systolic-Array Architecture for 2-D NTT-based Fragile Watermark Embedding
Authors:
H. P. L. Arjuna Madanayake,
R. J. Cintra,
V. S. Dimitrov,
L. Bruton
Abstract:
Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock…
▽ More
Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock cycle. Prototypes are operational on a Xilinx Sx35-10ff668 FPGA device. The maximum estimated throughput of the FPGA circuit is 100 million $4\times4$ HNTT fragile watermarked blocks per second, when clocked at 100 MHz. Potential applications exist in high-traffic back-end servers dealing with large amounts of protected digital images requiring authentication, in remote-sensing for high-security surveillance applications, in real-time video processing of information of a sensitive nature or matters of national security, in video/photographic content management of corporate clients, in authenticating multimedia for the entertainment industry, in the authentication of electronic evidence material, and in real-time news streaming.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Low-complexity Architecture for AR(1) Inference
Authors:
A. Borges Jr.,
R. J. Cintra,
D. F. G. Coelho,
V. S. Dimitrov
Abstract:
In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $\operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. Howev…
▽ More
In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $\operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. However, the hardware implementation of the introduced method presents considerable advantages in several relevant metrics, offering more than 95% reduction in dynamic power and doubling the maximum operating frequency when compared to the reference method.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Preventing Denial of Service Attacks in IoT Networks through Verifiable Delay Functions
Authors:
Vidal Attias,
Luigi Vigneri,
Vassil Dimitrov
Abstract:
Permissionless distributed ledgers provide a promising approach to deal with the Internet of Things (IoT) paradigm. Since IoT devices mostly generate data transactions and micropayments, distributed ledgers that use fees to regulate the network access are not an optimal choice. In this paper, we study a feeless architecture developed by IOTA and designed specifically for the IoT. Due to the lack o…
▽ More
Permissionless distributed ledgers provide a promising approach to deal with the Internet of Things (IoT) paradigm. Since IoT devices mostly generate data transactions and micropayments, distributed ledgers that use fees to regulate the network access are not an optimal choice. In this paper, we study a feeless architecture developed by IOTA and designed specifically for the IoT. Due to the lack of fees, malicious nodes can exploit this feature to generate an unbounded number of transactions and perform a denial of service attacks. We propose to mitigate these attacks through verifiable delay functions. These functions, which are non-parallelizable, hard to compute, and easy to verify, have been formulated only recently. In our work, we design a denial of service prevention mechanism which addresses network heterogeneity, limited node computational capabilities, and hardware-specific implementation optimizations. Verifiable delay functions have mostly been studied from a theoretical point of view, but little has been done in tangible applications. Hence, this paper can be considered as a pioneer work in the field, since it builds a bridge between this theoretical mathematical framework and a real-world problem.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
Fast Generation of RSA Keys using Smooth Integers
Authors:
Vassil Dimitrov,
Luigi Vigneri,
Vidal Attias
Abstract:
Primality generation is the cornerstone of several essential cryptographic systems. The problem has been a subject of deep investigations, but there is still a substantial room for improvements. Typically, the algorithms used have two parts trial divisions aimed at eliminating numbers with small prime factors and primality tests based on an easy-to-compute statement that is valid for primes and in…
▽ More
Primality generation is the cornerstone of several essential cryptographic systems. The problem has been a subject of deep investigations, but there is still a substantial room for improvements. Typically, the algorithms used have two parts trial divisions aimed at eliminating numbers with small prime factors and primality tests based on an easy-to-compute statement that is valid for primes and invalid for composites. In this paper, we will showcase a technique that will eliminate the first phase of the primality testing algorithms. The computational simulations show a reduction of the primality generation time by about 30% in the case of 1024-bit RSA key pairs. This can be particularly beneficial in the case of decentralized environments for shared RSA keys as the initial trial division part of the key generation algorithms can be avoided at no cost. This also significantly reduces the communication complexity. Another essential contribution of the paper is the introduction of a new one-way function that is computationally simpler than the existing ones used in public-key cryptography. This function can be used to create new random number generators, and it also could be potentially used for designing entirely new public-key encryption systems.
△ Less
Submitted 13 July, 2021; v1 submitted 24 December, 2019;
originally announced December 2019.
-
On the Decentralized Generation of theRSA Moduli in Multi-Party Settings
Authors:
Vidal Attias,
Luigi Vigneri,
Vassil Dimitrov
Abstract:
RSA cryptography is still widely used. Some of its applications (e.g., distributed signature schemes, cryptosystems) do not allow the RSA modulus to be generated by a centralized trusted entity. Instead, the factorization must remain unknown to all the network participants. To this date, the existing algorithms are either computationally expensive, or limited to two-party settings. In this work, w…
▽ More
RSA cryptography is still widely used. Some of its applications (e.g., distributed signature schemes, cryptosystems) do not allow the RSA modulus to be generated by a centralized trusted entity. Instead, the factorization must remain unknown to all the network participants. To this date, the existing algorithms are either computationally expensive, or limited to two-party settings. In this work, we design a decentralized multi-party computation algorithm able to generate efficiently the RSA modulus.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
A Blended Human-Robot Shared Control Framework to Handle Drift and Latency
Authors:
Anas Abou Allaban,
Velin Dimitrov,
Taşkın Padır
Abstract:
Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operato…
▽ More
Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operator's input and compensate for the factors resulting in degraded performance. What this solution does not consider though is the human input latency and errors caused by potential hardware failures that can occur during task completion when operating in disaster response and SAR scenarios. In this paper, we propose the use of blended shared control (BSC) architecture to address these issues and investigate the architecture's performance in constrained, dynamic environments with a differential drive robot that has input latency and erroneous odometry feedback. We conduct a validation study (n=12) for our control architecture and then a user study (n=14) in 2 different environments that are unknown to both the human operator and the autonomous agent. The results demonstrate that the BSC architecture can prevent collisions and enhance operator performance without the need of a complete transfer of control between the human operator and autonomous agent.
△ Less
Submitted 23 November, 2018;
originally announced November 2018.
-
Fast Matrix Inversion and Determinant Computation for Polarimetric Synthetic Aperture Radar
Authors:
D. F. G. Coelho,
R. J. Cintra,
A. C. Frery,
V. S. Dimitrov
Abstract:
This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical proc…
▽ More
This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical processing unit (GPGPU) and compared to the usual approach based on Cholesky factorization. The assessment with simulated observations and data from an actual PolSAR sensor show a speedup factor of about two when compared to the usual Cholesky factorization. Moreover, the expressions provided here can be implemented in any platform.
△ Less
Submitted 21 July, 2018;
originally announced July 2018.
-
A New Algorithm for Double Scalar Multiplication over Koblitz Curves
Authors:
J. Adikari,
V. S. Dimitrov,
R. J. Cintra
Abstract:
Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Double-base number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint $τ$-adic representation…
▽ More
Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Double-base number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint $τ$-adic representation for a pair of scalars and its application in double scalar multiplication. The new algorithm is inspired from double-base number system. We achieve 12% improvement in speed against state-of-the-art $τ$-adic joint sparse form.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Efficient Computation of the 8-point DCT via Summation by Parts
Authors:
D. F. G. Coelho,
R. J. Cintra,
V. S. Dimitrov
Abstract:
This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the…
▽ More
This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the theoretical minimal multiplicative complexity for the 8-point DCT. Depending on the nature of the input signal simplifications can be introduced and the overall complexity of the proposed algorithm can be further reduced. Several types of input signal are analyzed: arbitrary, null mean, accumulated, and null mean/accumulated signal. The proposed tool has potential application in harmonic detection, image enhancement, and feature extraction, where input signal DC level is discarded and/or the signal is required to be integrated.
△ Less
Submitted 28 March, 2018; v1 submitted 17 January, 2018;
originally announced January 2018.
-
VLSI Computational Architectures for the Arithmetic Cosine Transform
Authors:
N. Rajapaksha,
A. Madanayake,
R. J. Cintra,
J. Adikari,
V. S. Dimitrov
Abstract:
The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions a…
▽ More
The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions and integer constant multiplications, with very low area complexity, for null mean input sequences. The ACT can also be computed non-exactly for any input sequence, with low area complexity and low power consumption, utilizing the novel architecture described. However, as a trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to calculate the 8-point DCT. This requirement can easily be satisfied for applications dealing with spatial signals such as image sensors and biomedical sensor arrays, by placing sensor elements in a non-uniform grid. In this work, a hardware architecture for the computation of the null mean ACT is proposed, followed by a novel architectures that extend the ACT for non-null mean signals. All circuits are physically implemented and tested using the Xilinx XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for performance assessment.
△ Less
Submitted 30 October, 2017;
originally announced October 2017.
-
A Single-Channel Architecture for Algebraic Integer Based 8$\times$8 2-D DCT Computation
Authors:
A. Edirisuriya,
A. Madanayake,
R. J. Cintra,
V. S. Dimitrov
Abstract:
An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-…
▽ More
An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the 4-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of 8-point 1-D DCT cores from 5 down to 2. The architecture offers exact computation of 8$\times$8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time${}^2$ complexity metrics are also reduced by 23% and 22% respectively for designs with 8-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608$\cdot$10$^9$ pixels/seconds and a 8$\times$8 block rate of 118.875 MHz.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
On the Computation of Neumann Series
Authors:
Vassil Dimitrov,
Diego Coelho
Abstract:
This paper proposes new factorizations for computing the Neumann series. The factorizations are based on fast algorithms for small prime sizes series and the splitting of large sizes into several smaller ones. We propose a different basis for factorizations other than the well-known binary and ternary basis. We show that is possible to reduce the overall complexity for the usual binary decompositi…
▽ More
This paper proposes new factorizations for computing the Neumann series. The factorizations are based on fast algorithms for small prime sizes series and the splitting of large sizes into several smaller ones. We propose a different basis for factorizations other than the well-known binary and ternary basis. We show that is possible to reduce the overall complexity for the usual binary decomposition from 2log2(N)-2 multiplications to around 1.72log2(N)-2 using a basis of size five. Merging different basis we can demonstrate that we can build fast algorithms for particular sizes. We also show the asymptotic case where one can reduce the number of multiplications to around 1.70log2(N)-2. Simulations are performed for applications in the context of wireless communications and image rendering, where is necessary perform large sized matrices inversion.
△ Less
Submitted 18 July, 2017;
originally announced July 2017.
-
A Row-parallel 8$\times$8 2-D DCT Architecture Using Algebraic Integer Based Exact Computation
Authors:
A. Madanayake,
R. J. Cintra,
D. Onen,
V. S. Dimitrov,
N. T. Rajapaksha,
L. T. Bruton,
A. Edirisuriya
Abstract:
An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantizatio…
▽ More
An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on (i) optimized Dempster-Macleod multipliers and (ii) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on FPGA chip. Six designs, for 4- and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented and measured. The maximum clock rate and block-rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8$\times$307.787$\approx$2.462 GHz if eventually embedded in a real-time video-processing system. The equivalent frame rate is about 1187.35 Hz for the image size of 1920$\times$1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device.
△ Less
Submitted 14 February, 2015;
originally announced February 2015.
-
Fragile Watermarking Using Finite Field Trigonometrical Transforms
Authors:
R. J. Cintra,
V. S. Dimitrov,
H. M. de Oliveira,
R. M. Campello de Souza
Abstract:
Fragile digital watermarking has been applied for authentication and alteration detection in images. Utilizing the cosine and Hartley transforms over finite fields, a new transform domain fragile watermarking scheme is introduced. A watermark is embedded into a host image via a blockwise application of two-dimensional finite field cosine or Hartley transforms. Additionally, the considered finite f…
▽ More
Fragile digital watermarking has been applied for authentication and alteration detection in images. Utilizing the cosine and Hartley transforms over finite fields, a new transform domain fragile watermarking scheme is introduced. A watermark is embedded into a host image via a blockwise application of two-dimensional finite field cosine or Hartley transforms. Additionally, the considered finite field transforms are adjusted to be number theoretic transforms, appropriate for error-free calculation. The employed technique can provide invisible fragile watermarking for authentication systems with tamper location capability. It is shown that the choice of the finite field characteristic is pivotal to obtain perceptually invisible watermarked images. It is also shown that the generated watermarked images can be used as publicly available signature data for authentication purposes.
△ Less
Submitted 1 February, 2015;
originally announced February 2015.