Search | arXiv e-print repository

MPOGames: Efficient Multimodal Partially Observable Dynamic Games

Authors: Oswin So, Paul Drews, Thomas Balch, Velin Dimitrov, Guy Rosman, Evangelos A. Theodorou

Abstract: Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEn… ▽ More Game theoretic methods have become popular for planning and prediction in situations involving rich multi-agent interactions. However, these methods often assume the existence of a single local Nash equilibria and are hence unable to handle uncertainty in the intentions of different agents. While maximum entropy (MaxEnt) dynamic games try to address this issue, practical approaches solve for MaxEnt Nash equilibria using linear-quadratic approximations which are restricted to unimodal responses and unsuitable for scenarios with multiple local Nash equilibria. By reformulating the problem as a POMDP, we propose MPOGames, a method for efficiently solving MaxEnt dynamic games that captures the interactions between local Nash equilibria. We show the importance of uncertainty-aware game theoretic methods via a two-agent merge case study. Finally, we prove the real-time capabilities of our approach with hardware experiments on a 1/10th scale car platform. △ Less

Submitted 23 May, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to ICRA 2023

arXiv:2209.11181 [pdf, other]

Teaching Autonomous Systems Hands-On: Leveraging Modular Small-Scale Hardware in the Robotics Classroom

Authors: Johannes Betz, Hongrui Zheng, Zirui Zang, Florian Sauerbeck, Krzysztof Walas, Velin Dimitrov, Madhur Behl, Rosa Zheng, Joydeep Biswas, Venkat Krovi, Rahul Mangharam

Abstract: Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in develo**, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomo… ▽ More Although robotics courses are well established in higher education, the courses often focus on theory and sometimes lack the systematic coverage of the techniques involved in develo**, deploying, and applying software to real hardware. Additionally, most hardware platforms for robotics teaching are low-level toys aimed at younger students at middle-school levels. To address this gap, an autonomous vehicle hardware platform, called F1TENTH, is developed for teaching autonomous systems hands-on. This article describes the teaching modules and software stack for teaching at various educational levels with the theme of "racing" and competitions that replace exams. The F1TENTH vehicles offer a modular hardware platform and its related software for teaching the fundamentals of autonomous driving algorithms. From basic reactive methods to advanced planning algorithms, the teaching modules enhance students' computational thinking through autonomous driving with the F1TENTH vehicle. The F1TENTH car fills the gap between research platforms and low-end toy cars and offers hands-on experience in learning the topics in autonomous systems. Four universities have adopted the teaching modules for their semester-long undergraduate and graduate courses for multiple years. Student feedback is used to analyze the effectiveness of the F1TENTH platform. More than 80% of the students strongly agree that the hardware platform and modules greatly motivate their learning, and more than 70% of the students strongly agree that the hardware-enhanced their understanding of the subjects. The survey results show that more than 80% of the students strongly agree that the competitions motivate them for the course. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: 15 pages, 12 figures, 3 tables

arXiv:2207.14463 [pdf, other]

doi 10.3390/jlpea8040046

Low-Complexity Loeffler DCT Approximations for Image and Video Coding

Authors: D. F. G. Coelho, R. J. Cintra, F. M. Bayer, S. Kulasekera, A. Madanayake, P. A. C. Martinez, T. L. T. Silveira, R. S. Oliveira, V. S. Dimitrov

Abstract: This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where… ▽ More This paper introduced a matrix parametrization method based on the Loeffler discrete cosine transform (DCT) algorithm. As a result, a new class of eight-point DCT approximations was proposed, capable of unifying the mathematical formalism of several eight-point DCT approximations archived in the literature. Pareto-efficient DCT approximations are obtained through multicriteria optimization, where computational complexity, proximity, and coding performance are considered. Efficient approximations and their scaled 16- and 32-point versions are embedded into image and video encoders, including a JPEG-like codec and H.264/AVC and H.265/HEVC standards. Results are compared to the unmodified standard codecs. Efficient approximations are mapped and implemented on a Xilinx VLX240T FPGA and evaluated for area, speed, and power consumption. △ Less

Submitted 28 July, 2022; originally announced July 2022.

Comments: 25 pages, 11 figures, 7 tables

Journal ref: J. Low Power Electron. Appl. 2018, 8(4), 46

arXiv:2206.01146 [pdf, ps, other]

doi 10.1142/S0129626412500090

Block-Parallel Systolic-Array Architecture for 2-D NTT-based Fragile Watermark Embedding

Authors: H. P. L. Arjuna Madanayake, R. J. Cintra, V. S. Dimitrov, L. Bruton

Abstract: Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock… ▽ More Number-theoretic transforms (NTTs) have been applied in the fragile watermarking of digital images. A block-parallel systolic-array architecture is proposed for watermarking based on the 2-D special Hartley NTT (HNTT). The proposed core employs two 2-D special HNTT hardware cores, each using digital arithmetic over $\mathrm{GF}(3)$, and processes $4\times4$ blocks of pixels in parallel every clock cycle. Prototypes are operational on a Xilinx Sx35-10ff668 FPGA device. The maximum estimated throughput of the FPGA circuit is 100 million $4\times4$ HNTT fragile watermarked blocks per second, when clocked at 100 MHz. Potential applications exist in high-traffic back-end servers dealing with large amounts of protected digital images requiring authentication, in remote-sensing for high-security surveillance applications, in real-time video processing of information of a sensitive nature or matters of national security, in video/photographic content management of corporate clients, in authenticating multimedia for the entertainment industry, in the authentication of electronic evidence material, and in real-time news streaming. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 11 pages, 4 figures

Journal ref: Parallel Processing Letters, vol. 22, no. 03, 1250009, 2012

arXiv:2008.09633 [pdf, ps, other]

doi 10.1049/el.2019.4030

Low-complexity Architecture for AR(1) Inference

Authors: A. Borges Jr., R. J. Cintra, D. F. G. Coelho, V. S. Dimitrov

Abstract: In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $\operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. Howev… ▽ More In this Letter, we propose a low-complexity estimator for the correlation coefficient based on the signed $\operatorname{AR}(1)$ process. The introduced approximation is suitable for implementation in low-power hardware architectures. Monte Carlo simulations reveal that the proposed estimator performs comparably to the competing methods in literature with maximum error in order of $10^{-2}$. However, the hardware implementation of the introduced method presents considerable advantages in several relevant metrics, offering more than 95% reduction in dynamic power and doubling the maximum operating frequency when compared to the reference method. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: 7 pages, 3 tables, 4 figures

Journal ref: Electronics Letters 56 (14), 732-734, 2020

arXiv:2006.01977 [pdf, other]

doi 10.1109/GLOBECOM42002.2020.9322260

Preventing Denial of Service Attacks in IoT Networks through Verifiable Delay Functions

Authors: Vidal Attias, Luigi Vigneri, Vassil Dimitrov

Abstract: Permissionless distributed ledgers provide a promising approach to deal with the Internet of Things (IoT) paradigm. Since IoT devices mostly generate data transactions and micropayments, distributed ledgers that use fees to regulate the network access are not an optimal choice. In this paper, we study a feeless architecture developed by IOTA and designed specifically for the IoT. Due to the lack o… ▽ More Permissionless distributed ledgers provide a promising approach to deal with the Internet of Things (IoT) paradigm. Since IoT devices mostly generate data transactions and micropayments, distributed ledgers that use fees to regulate the network access are not an optimal choice. In this paper, we study a feeless architecture developed by IOTA and designed specifically for the IoT. Due to the lack of fees, malicious nodes can exploit this feature to generate an unbounded number of transactions and perform a denial of service attacks. We propose to mitigate these attacks through verifiable delay functions. These functions, which are non-parallelizable, hard to compute, and easy to verify, have been formulated only recently. In our work, we design a denial of service prevention mechanism which addresses network heterogeneity, limited node computational capabilities, and hardware-specific implementation optimizations. Verifiable delay functions have mostly been studied from a theoretical point of view, but little has been done in tangible applications. Hence, this paper can be considered as a pioneer work in the field, since it builds a bridge between this theoretical mathematical framework and a real-world problem. △ Less

Submitted 2 June, 2020; originally announced June 2020.

Journal ref: GLOBECOM 2020 - 2020 IEEE Global Communications Conference, 1-6

arXiv:1912.11546 [pdf, ps, other]

doi 10.1109/TC.2021.3095669

Fast Generation of RSA Keys using Smooth Integers

Authors: Vassil Dimitrov, Luigi Vigneri, Vidal Attias

Abstract: Primality generation is the cornerstone of several essential cryptographic systems. The problem has been a subject of deep investigations, but there is still a substantial room for improvements. Typically, the algorithms used have two parts trial divisions aimed at eliminating numbers with small prime factors and primality tests based on an easy-to-compute statement that is valid for primes and in… ▽ More Primality generation is the cornerstone of several essential cryptographic systems. The problem has been a subject of deep investigations, but there is still a substantial room for improvements. Typically, the algorithms used have two parts trial divisions aimed at eliminating numbers with small prime factors and primality tests based on an easy-to-compute statement that is valid for primes and invalid for composites. In this paper, we will showcase a technique that will eliminate the first phase of the primality testing algorithms. The computational simulations show a reduction of the primality generation time by about 30% in the case of 1024-bit RSA key pairs. This can be particularly beneficial in the case of decentralized environments for shared RSA keys as the initial trial division part of the key generation algorithms can be avoided at no cost. This also significantly reduces the communication complexity. Another essential contribution of the paper is the introduction of a new one-way function that is computationally simpler than the existing ones used in public-key cryptography. This function can be used to create new random number generators, and it also could be potentially used for designing entirely new public-key encryption systems. △ Less

Submitted 13 July, 2021; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: This paper contains 11 pages and 8 tables, in IEEE Transactions on Computers

arXiv:1912.11401 [pdf, ps, other]

On the Decentralized Generation of theRSA Moduli in Multi-Party Settings

Authors: Vidal Attias, Luigi Vigneri, Vassil Dimitrov

Abstract: RSA cryptography is still widely used. Some of its applications (e.g., distributed signature schemes, cryptosystems) do not allow the RSA modulus to be generated by a centralized trusted entity. Instead, the factorization must remain unknown to all the network participants. To this date, the existing algorithms are either computationally expensive, or limited to two-party settings. In this work, w… ▽ More RSA cryptography is still widely used. Some of its applications (e.g., distributed signature schemes, cryptosystems) do not allow the RSA modulus to be generated by a centralized trusted entity. Instead, the factorization must remain unknown to all the network participants. To this date, the existing algorithms are either computationally expensive, or limited to two-party settings. In this work, we design a decentralized multi-party computation algorithm able to generate efficiently the RSA modulus. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: The submission contains 14 pages and 12 figures. The conference to submit is not determined yet

arXiv:1811.09382 [pdf, other]

A Blended Human-Robot Shared Control Framework to Handle Drift and Latency

Authors: Anas Abou Allaban, Velin Dimitrov, Taşkın Padır

Abstract: Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operato… ▽ More Maximizing the utility of human-robot teams in disaster response and search and rescue (SAR) missions remains to be a challenging problem. This is due to the dynamic, uncertain nature of the environment and the variability in cognitive performance of the human operators. By having an autonomous agent share control with the operator, we can achieve near-optimal performance by augmenting the operator's input and compensate for the factors resulting in degraded performance. What this solution does not consider though is the human input latency and errors caused by potential hardware failures that can occur during task completion when operating in disaster response and SAR scenarios. In this paper, we propose the use of blended shared control (BSC) architecture to address these issues and investigate the architecture's performance in constrained, dynamic environments with a differential drive robot that has input latency and erroneous odometry feedback. We conduct a validation study (n=12) for our control architecture and then a user study (n=14) in 2 different environments that are unknown to both the human operator and the autonomous agent. The results demonstrate that the BSC architecture can prevent collisions and enhance operator performance without the need of a complete transfer of control between the human operator and autonomous agent. △ Less

Submitted 23 November, 2018; originally announced November 2018.

arXiv:1807.08084 [pdf, ps, other]

doi 10.1016/j.cageo.2018.07.002

Fast Matrix Inversion and Determinant Computation for Polarimetric Synthetic Aperture Radar

Authors: D. F. G. Coelho, R. J. Cintra, A. C. Frery, V. S. Dimitrov

Abstract: This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical proc… ▽ More This paper introduces a fast algorithm for simultaneous inversion and determinant computation of small sized matrices in the context of fully Polarimetric Synthetic Aperture Radar (PolSAR) image processing and analysis. The proposed fast algorithm is based on the computation of the adjoint matrix and the symmetry of the input matrix. The algorithm is implemented in a general purpose graphical processing unit (GPGPU) and compared to the usual approach based on Cholesky factorization. The assessment with simulated observations and data from an actual PolSAR sensor show a speedup factor of about two when compared to the usual Cholesky factorization. Moreover, the expressions provided here can be implemented in any platform. △ Less

Submitted 21 July, 2018; originally announced July 2018.

Comments: 7 pages, 1 figure

Journal ref: Computers and Geosciences, no. 119 (2018), pages 109-114

arXiv:1801.08589 [pdf, ps, other]

doi 10.1109/ISCAS.2011.5937664

A New Algorithm for Double Scalar Multiplication over Koblitz Curves

Authors: J. Adikari, V. S. Dimitrov, R. J. Cintra

Abstract: Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Double-base number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint $τ$-adic representation… ▽ More Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Double-base number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint $τ$-adic representation for a pair of scalars and its application in double scalar multiplication. The new algorithm is inspired from double-base number system. We achieve 12% improvement in speed against state-of-the-art $τ$-adic joint sparse form. △ Less

Submitted 25 January, 2018; originally announced January 2018.

Comments: 5 pages, 2 figures, 1 table

Journal ref: Circuits and Systems (ISCAS), 2011 IEEE International Symposium on

arXiv:1801.05832 [pdf, ps, other]

doi 10.1007/s11265-017-1270-6

Efficient Computation of the 8-point DCT via Summation by Parts

Authors: D. F. G. Coelho, R. J. Cintra, V. S. Dimitrov

Abstract: This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the… ▽ More This paper introduces a new fast algorithm for the 8-point discrete cosine transform (DCT) based on the summation-by-parts formula. The proposed method converts the DCT matrix into an alternative transformation matrix that can be decomposed into sparse matrices of low multiplicative complexity. The method is capable of scaled and exact DCT computation and its associated fast algorithm achieves the theoretical minimal multiplicative complexity for the 8-point DCT. Depending on the nature of the input signal simplifications can be introduced and the overall complexity of the proposed algorithm can be further reduced. Several types of input signal are analyzed: arbitrary, null mean, accumulated, and null mean/accumulated signal. The proposed tool has potential application in harmonic detection, image enhancement, and feature extraction, where input signal DC level is discarded and/or the signal is required to be integrated. △ Less

Submitted 28 March, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

Comments: Fixed Fig. 1 with the block diagram of the proposed architecture. Manuscript contains 13 pages, 4 figures, 2 tables

Journal ref: J Sign Process Syst (2017)

arXiv:1710.11200 [pdf, ps, other]

doi 10.1109/TC.2014.2366732

VLSI Computational Architectures for the Arithmetic Cosine Transform

Authors: N. Rajapaksha, A. Madanayake, R. J. Cintra, J. Adikari, V. S. Dimitrov

Abstract: The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions a… ▽ More The discrete cosine transform (DCT) is a widely-used and important signal processing tool employed in a plethora of applications. Typical fast algorithms for nearly-exact computation of DCT require floating point arithmetic, are multiplier intensive, and accumulate round-off errors. Recently proposed fast algorithm arithmetic cosine transform (ACT) calculates the DCT exactly using only additions and integer constant multiplications, with very low area complexity, for null mean input sequences. The ACT can also be computed non-exactly for any input sequence, with low area complexity and low power consumption, utilizing the novel architecture described. However, as a trade-off, the ACT algorithm requires 10 non-uniformly sampled data points to calculate the 8-point DCT. This requirement can easily be satisfied for applications dealing with spatial signals such as image sensors and biomedical sensor arrays, by placing sensor elements in a non-uniform grid. In this work, a hardware architecture for the computation of the null mean ACT is proposed, followed by a novel architectures that extend the ACT for non-null mean signals. All circuits are physically implemented and tested using the Xilinx XC6VLX240T FPGA device and synthesized for 45 nm TSMC standard-cell library for performance assessment. △ Less

Submitted 30 October, 2017; originally announced October 2017.

Comments: 8 pages, 2 figures, 6 tables

Journal ref: IEEE Transactions on Computers, vol. 64, no. 9, Sep 2015

arXiv:1710.09975 [pdf, ps, other]

doi 10.1109/TCSVT.2013.2270397

A Single-Channel Architecture for Algebraic Integer Based 8$\times$8 2-D DCT Computation

Authors: A. Edirisuriya, A. Madanayake, R. J. Cintra, V. S. Dimitrov

Abstract: An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-… ▽ More An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8$\times$8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the 4-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of 8-point 1-D DCT cores from 5 down to 2. The architecture offers exact computation of 8$\times$8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time${}^2$ complexity metrics are also reduced by 23% and 22% respectively for designs with 8-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608$\cdot$10$^9$ pixels/seconds and a 8$\times$8 block rate of 118.875 MHz. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Comments: 8 pages, 6 figures, 5 tables

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, volume 23, number 12, pages 2083-2089, Dec. 2013

arXiv:1707.05846 [pdf, ps, other]

On the Computation of Neumann Series

Authors: Vassil Dimitrov, Diego Coelho

Abstract: This paper proposes new factorizations for computing the Neumann series. The factorizations are based on fast algorithms for small prime sizes series and the splitting of large sizes into several smaller ones. We propose a different basis for factorizations other than the well-known binary and ternary basis. We show that is possible to reduce the overall complexity for the usual binary decompositi… ▽ More This paper proposes new factorizations for computing the Neumann series. The factorizations are based on fast algorithms for small prime sizes series and the splitting of large sizes into several smaller ones. We propose a different basis for factorizations other than the well-known binary and ternary basis. We show that is possible to reduce the overall complexity for the usual binary decomposition from 2log2(N)-2 multiplications to around 1.72log2(N)-2 using a basis of size five. Merging different basis we can demonstrate that we can build fast algorithms for particular sizes. We also show the asymptotic case where one can reduce the number of multiplications to around 1.70log2(N)-2. Simulations are performed for applications in the context of wireless communications and image rendering, where is necessary perform large sized matrices inversion. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: 11 pages, 2 figures

arXiv:1502.04221 [pdf, ps, other]

doi 10.1109/TCSVT.2011.2181232

A Row-parallel 8$\times$8 2-D DCT Architecture Using Algebraic Integer Based Exact Computation

Authors: A. Madanayake, R. J. Cintra, D. Onen, V. S. Dimitrov, N. T. Rajapaksha, L. T. Bruton, A. Edirisuriya

Abstract: An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantizatio… ▽ More An algebraic integer (AI) based time-multiplexed row-parallel architecture and two final-reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI-encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8$\times$8 2-D DCT which is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on (i) optimized Dempster-Macleod multipliers and (ii) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on FPGA chip. Six designs, for 4- and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented and measured. The maximum clock rate and block-rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8$\times$307.787$\approx$2.462 GHz if eventually embedded in a real-time video-processing system. The equivalent frame rate is about 1187.35 Hz for the image size of 1920$\times$1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device. △ Less

Submitted 14 February, 2015; originally announced February 2015.

Comments: 28 pages, 9 figures, 7 tables, corrected typos

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 6, pp. 915--929, 2012

arXiv:1502.00296 [pdf, other]

doi 10.1016/j.image.2009.04.003

Fragile Watermarking Using Finite Field Trigonometrical Transforms

Authors: R. J. Cintra, V. S. Dimitrov, H. M. de Oliveira, R. M. Campello de Souza

Abstract: Fragile digital watermarking has been applied for authentication and alteration detection in images. Utilizing the cosine and Hartley transforms over finite fields, a new transform domain fragile watermarking scheme is introduced. A watermark is embedded into a host image via a blockwise application of two-dimensional finite field cosine or Hartley transforms. Additionally, the considered finite f… ▽ More Fragile digital watermarking has been applied for authentication and alteration detection in images. Utilizing the cosine and Hartley transforms over finite fields, a new transform domain fragile watermarking scheme is introduced. A watermark is embedded into a host image via a blockwise application of two-dimensional finite field cosine or Hartley transforms. Additionally, the considered finite field transforms are adjusted to be number theoretic transforms, appropriate for error-free calculation. The employed technique can provide invisible fragile watermarking for authentication systems with tamper location capability. It is shown that the choice of the finite field characteristic is pivotal to obtain perceptually invisible watermarked images. It is also shown that the generated watermarked images can be used as publicly available signature data for authentication purposes. △ Less

Submitted 1 February, 2015; originally announced February 2015.

Comments: 9 pages, 7 figures, 2 tables

Journal ref: Image Communication, Volume 24, Issue 7, August, 2009, pp. 587-597

Showing 1–17 of 17 results for author: Dimitrov, V