Search | arXiv e-print repository

Real-time Tracking in a Status Update System with an Imperfect Feedback Channel

Authors: Saeid Sadeghi Vilni, Abolfazl Zakeri, Mohammad Moltafet, Marian Codreanu

Abstract: We consider a status update system consisting of a finite-state Markov source, an energy-harvesting-enabled transmitter, and a sink. The forward and feedback channels between the transmitter and the sink are error-prone. We study the problem of minimizing the long-term time average of a (generic) distortion function subject to an energy causality constraint. Since the feedback channel is error-pro… ▽ More We consider a status update system consisting of a finite-state Markov source, an energy-harvesting-enabled transmitter, and a sink. The forward and feedback channels between the transmitter and the sink are error-prone. We study the problem of minimizing the long-term time average of a (generic) distortion function subject to an energy causality constraint. Since the feedback channel is error-prone, the transmitter has only partial knowledge about the transmission results and, consequently, about the estimate of the source state at the sink. Therefore, we model the problem as a partially observable Markov decision process (POMDP), which is then cast as a belief-MDP problem. The infinite belief space makes solving the belief-MDP difficult. Thus, by exploiting a specific property of the belief evolution, we truncate the state space and formulate a finite-state MDP problem, which is then solved using the relative value iteration algorithm (RVIA). Furthermore, we propose a low-complexity transmission policy in which the belief-MDP problem is transformed into a sequence of per-slot optimization problems. Simulation results show the effectiveness of the proposed policies and their superiority compared to a baseline policy. Moreover, we numerically show that the proposed policies have switching-type structures. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04486 [pdf, other]

Variational and Explanatory Neural Networks for Encoding Cancer Profiles and Predicting Drug Responses

Authors: Tianshu Feng, Rohan Gnanaolivu, Abolfazl Safikhani, Yuanhang Liu, Jun Jiang, Nicholas Chia, Alexander Partin, Priyanka Vasanthakumari, Yitan Zhu, Chen Wang

Abstract: Human cancers present a significant public health challenge and require the discovery of novel drugs through translational research. Transcriptomics profiling data that describes molecular activities in tumors and cancer cell lines are widely utilized for predicting anti-cancer drug responses. However, existing AI models face challenges due to noise in transcriptomics data and lack of biological i… ▽ More Human cancers present a significant public health challenge and require the discovery of novel drugs through translational research. Transcriptomics profiling data that describes molecular activities in tumors and cancer cell lines are widely utilized for predicting anti-cancer drug responses. However, existing AI models face challenges due to noise in transcriptomics data and lack of biological interpretability. To overcome these limitations, we introduce VETE (Variational and Explanatory Transcriptomics Encoder), a novel neural network framework that incorporates a variational component to mitigate noise effects and integrates traceable gene ontology into the neural network architecture for encoding cancer transcriptomics data. Key innovations include a local interpretability-guided method for identifying ontology paths, a visualization tool to elucidate biological mechanisms of drug responses, and the application of centralized large scale hyperparameter optimization. VETE demonstrated robust accuracy in cancer cell line classification and drug response prediction. Additionally, it provided traceable biological explanations for both tasks and offers insights into the mechanisms underlying its predictions. VETE bridges the gap between AI-driven predictions and biologically meaningful insights in cancer research, which represents a promising advancement in the field. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03575 [pdf, other]

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

Abstract: Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity mode… ▽ More Multiple instance learning (MIL) stands as a powerful approach in weakly supervised learning, regularly employed in histological whole slide image (WSI) classification for detecting tumorous lesions. However, existing mainstream MIL methods focus on modeling correlation between instances while overlooking the inherent diversity among instances. However, few MIL methods have aimed at diversity modeling, which empirically show inferior performance but with a high computational cost. To bridge this gap, we propose a novel MIL aggregation method based on diverse global representation (DGR-MIL), by modeling diversity among instances through a set of global vectors that serve as a summary of all instances. First, we turn the instance correlation into the similarity between instance embeddings and the predefined global vectors through a cross-attention mechanism. This stems from the fact that similar instance embeddings typically would result in a higher correlation with a certain global vector. Second, we propose two mechanisms to enforce the diversity among the global vectors to be more descriptive of the entire bag: (i) positive instance alignment and (ii) a novel, efficient, and theoretically guaranteed diversification learning paradigm. Specifically, the positive instance alignment module encourages the global vectors to align with the center of positive instances (e.g., instances containing tumors in WSI). To further diversify the global representations, we propose a novel diversification learning paradigm leveraging the determinantal point process. The proposed model outperforms the state-of-the-art MIL aggregation models by a substantial margin on the CAMELYON-16 and the TCGA-lung cancer datasets. The code is available at \url{https://github.com/ChongQingNoSubway/DGR-MIL}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2406.19980 [pdf]

Comparative Analysis of LSTM Neural Networks and Traditional Machine Learning Models for Predicting Diabetes Patient Readmission

Authors: Abolfazl Zarghani

Abstract: Diabetes mellitus is a chronic metabolic disorder that has emerged as one of the major health problems worldwide due to its high prevalence and serious complications, which are pricey to manage. Effective management requires good glycemic control and regular follow-up in the clinic; however, non-adherence to scheduled follow-ups is very common. This study uses the Diabetes 130-US Hospitals dataset… ▽ More Diabetes mellitus is a chronic metabolic disorder that has emerged as one of the major health problems worldwide due to its high prevalence and serious complications, which are pricey to manage. Effective management requires good glycemic control and regular follow-up in the clinic; however, non-adherence to scheduled follow-ups is very common. This study uses the Diabetes 130-US Hospitals dataset for analysis and prediction of readmission patients by various traditional machine learning models, such as XGBoost, LightGBM, CatBoost, Decision Tree, and Random Forest, and also uses an in-house LSTM neural network for comparison. The quality of the data was assured by preprocessing it, and the performance evaluation for all these models was based on accuracy, precision, recall, and F1-score. LightGBM turned out to be the best traditional model, while XGBoost was the runner-up. The LSTM model suffered from overfitting despite high training accuracy. A major strength of LSTM is capturing temporal dependencies among the patient data. Further, SHAP values were used, which improved model interpretability, whereby key factors among them number of lab procedures and discharge disposition were identified as critical in the prediction of readmissions. This study demonstrates that model selection, validation, and interpretability are key steps in predictive healthcare modeling. This will help health providers design interventions for improved follow-up adherence and better management of diabetes. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.17248 [pdf, other]

MindSpore Quantum: A User-Friendly, High-Performance, and AI-Compatible Quantum Computing Framework

Authors: Xusheng Xu, Jiangyu Cui, Zidong Cui, Runhong He, Qingyu Li, Xiaowei Li, Yanling Lin, Jiale Liu, Wuxin Liu, Jiale Lu, Maolin Luo, Chufan Lyu, Shijie Pan, Mosharev Pavel, Runqiu Shu, Jialiang Tang, Ruoqian Xu, Shu Xu, Kang Yang, Fan Yu, Qingguo Zeng, Haiying Zhao, Qiang Zheng, Junyuan Zhou, Xu Zhou , et al. (14 additional authors not shown)

Abstract: We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum… ▽ More We introduce MindSpore Quantum, a pioneering hybrid quantum-classical framework with a primary focus on the design and implementation of noisy intermediate-scale quantum (NISQ) algorithms. Leveraging the robust support of MindSpore, an advanced open-source deep learning training/inference framework, MindSpore Quantum exhibits exceptional efficiency in the design and training of variational quantum algorithms on both CPU and GPU platforms, delivering remarkable performance. Furthermore, this framework places a strong emphasis on enhancing the operational efficiency of quantum algorithms when executed on real quantum hardware. This encompasses the development of algorithms for quantum circuit compilation and qubit map**, crucial components for achieving optimal performance on quantum processors. In addition to the core framework, we introduce QuPack, a meticulously crafted quantum computing acceleration engine. QuPack significantly accelerates the simulation speed of MindSpore Quantum, particularly in variational quantum eigensolver (VQE), quantum approximate optimization algorithm (QAOA), and tensor network simulations, providing astonishing speed. This combination of cutting-edge technologies empowers researchers and practitioners to explore the frontiers of quantum computing with unprecedented efficiency and performance. △ Less

Submitted 10 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.17107 [pdf, other]

A Fast Single-Loop Primal-Dual Algorithm for Non-Convex Functional Constrained Optimization

Authors: Jong Gwang Kim, Ashish Chandra, Abolfazl Hashemi, Christopher Brinton

Abstract: Non-convex functional constrained optimization problems have gained substantial attention in machine learning and signal processing. This paper develops a new primal-dual algorithm for solving this class of problems. The algorithm is based on a novel form of the Lagrangian function, termed {\em Proximal-Perturbed Augmented Lagrangian}, which enables us to develop an efficient and simple first-orde… ▽ More Non-convex functional constrained optimization problems have gained substantial attention in machine learning and signal processing. This paper develops a new primal-dual algorithm for solving this class of problems. The algorithm is based on a novel form of the Lagrangian function, termed {\em Proximal-Perturbed Augmented Lagrangian}, which enables us to develop an efficient and simple first-order algorithm that converges to a stationary solution under mild conditions. Our method has several key features of differentiation over existing augmented Lagrangian-based methods: (i) it is a single-loop algorithm that does not require the continuous adjustment of the penalty parameter to infinity; (ii) it can achieves an improved iteration complexity of $\widetilde{\mathcal{O}}(1/ε^2)$ or at least ${\mathcal{O}}(1/ε^{2/q})$ with $q \in (2/3,1)$ for computing an $ε$-approximate stationary solution, compared to the best-known complexity of $\mathcal{O}(1/ε^3)$; and (iii) it effectively handles functional constraints for feasibility guarantees with fixed parameters, without imposing boundedness assumptions on the dual iterates and the penalty parameters. We validate the effectiveness of our method through numerical experiments on popular non-convex problems. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.14896 [pdf, other]

SelfReg-UNet: Self-Regularized UNet for Medical Image Segmentation

Authors: Wenhui Zhu, Xiwen Chen, Peijie Qiu, Mohammad Farazi, Aristeidis Sotiras, Abolfazl Razi, Yalin Wang

Abstract: Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important facto… ▽ More Since its introduction, UNet has been leading a variety of medical image segmentation tasks. Although numerous follow-up studies have also been dedicated to improving the performance of standard UNet, few have conducted in-depth analyses of the underlying interest pattern of UNet in medical image segmentation. In this paper, we explore the patterns learned in a UNet and observe two important factors that potentially affect its performance: (i) irrelative feature learned caused by asymmetric supervision; (ii) feature redundancy in the feature map. To this end, we propose to balance the supervision between encoder and decoder and reduce the redundant information in the UNet. Specifically, we use the feature map that contains the most semantic information (i.e., the last layer of the decoder) to provide additional supervision to other blocks to provide additional supervision and reduce feature redundancy by leveraging feature distillation. The proposed method can be easily integrated into existing UNet architecture in a plug-and-play fashion with negligible computational cost. The experimental results suggest that the proposed method consistently improves the performance of standard UNets on four medical image segmentation datasets. The code is available at \url{https://github.com/ChongQingNoSubway/SelfReg-UNet} △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted as a conference paper to 2024 MICCAI

arXiv:2406.13921 [pdf, other]

Quantum Enhanced Sensitivity through Many-Body Bloch Oscillations

Authors: Hassan Manshouri, Moslem Zarei, Mehdi Abdi, Sougato Bose, Abolfazl Bayat

Abstract: We investigate the sensing capacity of non-equilibrium dynamics in quantum systems exhibiting Bloch oscillations. By focusing on resource efficiency of the probe, quantified by quantum Fisher information, we find different scaling behaviors in two different phases, namely localized and extended. Our results provide a quantitative ansatz for quantum Fisher information in terms of time, probe size,… ▽ More We investigate the sensing capacity of non-equilibrium dynamics in quantum systems exhibiting Bloch oscillations. By focusing on resource efficiency of the probe, quantified by quantum Fisher information, we find different scaling behaviors in two different phases, namely localized and extended. Our results provide a quantitative ansatz for quantum Fisher information in terms of time, probe size, and the number of excitations. In the long-time regime, the quantum Fisher information is a quadratic function of time, touching the Heisenberg limit. The system size scaling drastically depends on the phase changing from super-Heisenberg scaling in the extended phase to size-independent behavior in the localized phase. Furthermore, increasing the number of excitations always enhances the precision of the probe, although, in the interacting systems the enhancement becomes less eminent than the non-interacting probes, which is due to induced localization by interaction between excitations. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13815 [pdf]

IG-CFAT: An Improved GAN-Based Framework for Effectively Exploiting Transformers in Real-World Image Super-Resolution

Authors: Alireza Aghelan, Ali Amiryan, Abolfazl Zarghani, Behnoush Hatami

Abstract: In the field of single image super-resolution (SISR), transformer-based models, have demonstrated significant advancements. However, the potential and efficiency of these models in applied fields such as real-world image super-resolution are less noticed and there are substantial opportunities for improvement. Recently, composite fusion attention transformer (CFAT), outperformed previous state-of-… ▽ More In the field of single image super-resolution (SISR), transformer-based models, have demonstrated significant advancements. However, the potential and efficiency of these models in applied fields such as real-world image super-resolution are less noticed and there are substantial opportunities for improvement. Recently, composite fusion attention transformer (CFAT), outperformed previous state-of-the-art (SOTA) models in classic image super-resolution. This paper extends the CFAT model to an improved GAN-based model called IG-CFAT to effectively exploit the performance of transformers in real-world image super-resolution. IG-CFAT incorporates a semantic-aware discriminator to reconstruct image details more accurately, significantly improving perceptual quality. Moreover, our model utilizes an adaptive degradation model to better simulate real-world degradations. Our methodology adds wavelet losses to conventional loss functions of GAN-based super-resolution models to reconstruct high-frequency details more efficiently. Empirical results demonstrate that IG-CFAT sets new benchmarks in real-world image super-resolution, outperforming SOTA models in both quantitative and qualitative metrics. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.08564 [pdf, other]

A new approach for predicting the Quality of Experience in multimedia services using machine learning

Authors: Parsa Hassani Shariat Panahi, Amir Hossein Jalilvand, Abolfazl Diyanat

Abstract: In today's world, the Internet is recognized as one of the essentials of human life, playing a significant role in communications, business, and lifestyle. The quality of internet services can have widespread negative impacts on individual and social levels. Consequently, Quality of Service (QoS) has become a fundamental necessity for service providers in a competitive market aiming to offer super… ▽ More In today's world, the Internet is recognized as one of the essentials of human life, playing a significant role in communications, business, and lifestyle. The quality of internet services can have widespread negative impacts on individual and social levels. Consequently, Quality of Service (QoS) has become a fundamental necessity for service providers in a competitive market aiming to offer superior services. The success and survival of these providers depend on their ability to maintain high service quality and ensure satisfaction.Alongside QoS, the concept of Quality of Experience (QoE) has emerged with the development of telephony networks. QoE focuses on the user's satisfaction with the service, hel** operators adjust their services to meet user expectations. Recent research shows a trend towards utilizing machine learning and deep learning techniques to predict QoE. Researchers aim to develop accurate models by leveraging large volumes of data from network and user interactions, considering various real-world scenarios. Despite the complexity of network environments, this research provides a practical framework for improving and evaluating QoE. This study presents a comprehensive framework for evaluating QoE in multimedia services, adhering to the ITU-T P.1203 standard which includes automated data collection processes and uses machine learning algorithms to predict user satisfaction based on key network parameters. By collecting over 20,000 data records from different network conditions and users, the Random Forest model achieved a prediction accuracy of 95.8% for user satisfaction. This approach allows operators to dynamically allocate network resources in real-time, maintaining high levels of customer satisfaction with minimal costs. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures

arXiv:2406.03964 [pdf, other]

Quantum Speed Limits for Implementation of Unitary Transformations

Authors: Abolfazl Farmanian, Vahid Karimipour

Abstract: Quantum speed limits are the boundaries that define how quickly one quantum state can transform into another. Instead of focusing on the transformation between pairs of states, we provide bounds on the speed limit of quantum evolution by unitary operators in arbitrary dimensions. These do not depend on the initial and final state but depend only on the trace of the unitary operator that is to be i… ▽ More Quantum speed limits are the boundaries that define how quickly one quantum state can transform into another. Instead of focusing on the transformation between pairs of states, we provide bounds on the speed limit of quantum evolution by unitary operators in arbitrary dimensions. These do not depend on the initial and final state but depend only on the trace of the unitary operator that is to be implemented and the gross characteristics (average and variance) of the energy spectrum of the Hamiltonian which generates this unitary evolution. The bounds that we find can be thought of as the generalization of the Mandelstam-Tamm (TM) and the Margolus-Levitin (ML) bound for state transformations to implementations of unitary operators. We will discuss the application of these bounds in several classes of transformations that are of interest in quantum information processing. △ Less

Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 13 pages, revised introduction, references added

arXiv:2405.19904 [pdf, other]

Statistical physics of principal minors: Cavity approach

Authors: A. Ramezanpour, M. A. Rajabpour

Abstract: Determinants are useful to represent the state of an interacting system of (effectively) repulsive and independent elements, like fermions in a quantum system and training samples in a learning problem. A computationally challenging problem is to compute the sum of powers of principal minors of a matrix which is relevant to the study of critical behaviors in quantum fermionic systems and finding a… ▽ More Determinants are useful to represent the state of an interacting system of (effectively) repulsive and independent elements, like fermions in a quantum system and training samples in a learning problem. A computationally challenging problem is to compute the sum of powers of principal minors of a matrix which is relevant to the study of critical behaviors in quantum fermionic systems and finding a subset of maximally informative training data for a learning algorithm. Specifically, principal minors of positive square matrices can be considered as statistical weights of a random point process on the set of the matrix indices. The probability of each subset of the indices is in general proportional to a positive power of the determinant of the associated sub-matrix. We use Gaussian representation of the determinants for symmetric and positive matrices to estimate the partition function (or free energy) and the entropy of principal minors within the Bethe approximation. The results are expected to be asymptotically exact for diagonally dominant matrices with locally tree-like structures. We consider the Laplacian matrix of random regular graphs of degree $K=2,3,4$ and exactly characterize the structure of the relevant minors in a mean-field model of such matrices. No (finite-temperature) phase transition is observed in this class of diagonally dominant matrices by increasing the positive power of the principal minors, which here plays the role of an inverse temperature. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 23 pages, 12 figures

arXiv:2405.18237 [pdf, other]

Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression

Authors: Zhankun Luo, Abolfazl Hashemi

Abstract: We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear converg… ▽ More We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression. △ Less

Submitted 3 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: This paper was accepted by the 41st International Conference on Machine Learning (ICML 2024). The code for numerical experiments is available at https://github.com/dassein/cycloid_em_mlr

arXiv:2405.11639 [pdf, other]

Fair Set Cover

Authors: Mohsen Dehghankar, Rahul Raychaudhury, Stavros Sintos, Abolfazl Asudeh

Abstract: The potential harms of algorithmic decisions have ignited algorithmic fairness as a central topic in computer science. One of the fundamental problems in computer science is Set Cover, which has numerous applications with societal impacts, such as assembling a small team of individuals that collectively satisfy a range of expertise requirements. However, despite its broad application spectrum and… ▽ More The potential harms of algorithmic decisions have ignited algorithmic fairness as a central topic in computer science. One of the fundamental problems in computer science is Set Cover, which has numerous applications with societal impacts, such as assembling a small team of individuals that collectively satisfy a range of expertise requirements. However, despite its broad application spectrum and significant potential impact, set cover has yet to be studied through the lens of fairness. Therefore, in this paper, we introduce Fair Set Cover, which aims not only to cover with a minimum-size set but also to satisfy demographic parity in its selection of sets. To this end, we develop multiple versions of fair set cover, study their hardness, and devise efficient approximation algorithms for each variant. Notably, under certain assumptions, our algorithms always guarantees zero-unfairness, with only a small increase in the approximation ratio compared to regular set cover. Furthermore, our experiments on various data sets and across different settings confirm the negligible price of fairness, as (a) the output size increases only slightly (if any) and (b) the time to compute the output does not significantly increase. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2405.06760 [pdf]

Opportunities for Persian Digital Humanities Research with Artificial Intelligence Language Models; Case Study: Forough Farrokhzad

Authors: Arash Rasti Meymandi, Zahra Hosseini, Sina Davari, Abolfazl Moshiri, Shabnam Rahimi-Golkhandan, Khashayar Namdar, Nikta Feizi, Mohamad Tavakoli-Targhi, Farzad Khalvati

Abstract: This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based… ▽ More This study explores the integration of advanced Natural Language Processing (NLP) and Artificial Intelligence (AI) techniques to analyze and interpret Persian literature, focusing on the poetry of Forough Farrokhzad. Utilizing computational methods, we aim to unveil thematic, stylistic, and linguistic patterns in Persian poetry. Specifically, the study employs AI models including transformer-based language models for clustering of the poems in an unsupervised framework. This research underscores the potential of AI in enhancing our understanding of Persian literary heritage, with Forough Farrokhzad's work providing a comprehensive case study. This approach not only contributes to the field of Persian Digital Humanities but also sets a precedent for future research in Persian literary studies using computational techniques. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.05051 [pdf, other]

Variational simulation of $d$-level systems on qubit-based quantum simulators

Authors: Chufan Lyu, Zuoheng Zou, Xusheng Xu, Man-Hong Yung, Abolfazl Bayat

Abstract: Current quantum simulators are primarily qubit-based, making them naturally suitable for simulating 2-level quantum systems. However, many systems in nature are inherently $d$-level, including higher spins, bosons, vibrational modes, and itinerant electrons. To simulate $d$-level systems on qubit-based quantum simulators, an encoding method is required to map the $d$-level system onto a qubit basi… ▽ More Current quantum simulators are primarily qubit-based, making them naturally suitable for simulating 2-level quantum systems. However, many systems in nature are inherently $d$-level, including higher spins, bosons, vibrational modes, and itinerant electrons. To simulate $d$-level systems on qubit-based quantum simulators, an encoding method is required to map the $d$-level system onto a qubit basis. Such map** may introduce illegitimate states in the Hilbert space which makes the simulation more sophisticated. In this paper, we develop a systematic method to address the illegitimate states. In addition, we compare two different map**s, namely binary and symmetry encoding methods, and compare their performance through variational simulation of the ground state and time evolution of various many-body systems. While binary encoding is very efficient with respect to the number of qubits it cannot easily incorporate the symmetries of the original Hamiltonian in its circuit design. On the other hand, the symmetry encoding facilitates the implementation of symmetries in the circuit design, though it comes with an overhead for the number of qubits. Our analysis shows that the symmetry encoding significantly outperforms the binary encoding, despite requiring extra qubits. Their advantage is indicated by requiring fewer two-qubit gates, converging faster, and being far more resilient to Barren plateaus. We have performed variational ground state simulations of spin-1, spin-3/2, and bosonic systems as well as variational time evolution of spin-1 systems. Our proposal can be implemented on existing quantum simulators and its potential is extendable to a broad class of physical models. △ Less

Submitted 25 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, 2 tables

arXiv:2405.03140 [pdf, other]

TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

Authors: Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

Abstract: Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reform… ▽ More Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reformulate MTSC as a weakly supervised problem, introducing a novel multiple-instance learning (MIL) framework for better localization of patterns of interest and modeling time dependencies within time series. Our novel approach, TimeMIL, formulates the temporal correlation and ordering within a time-aware MIL pooling, leveraging a tokenized transformer with a specialized learnable wavelet positional token. The proposed method surpassed 26 recent state-of-the-art methods, underscoring the effectiveness of the weakly supervised TimeMIL in MTSC. The code will be available at https://github.com/xiwenc1/TimeMIL. △ Less

Submitted 27 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

arXiv:2405.02944 [pdf, other]

Imaging Signal Recovery Using Neural Network Priors Under Uncertain Forward Model Parameters

Authors: Xiwen Chen, Wenhui Zhu, Peijie Qiu, Abolfazl Razi

Abstract: Inverse imaging problems (IIPs) arise in various applications, with the main objective of reconstructing an image from its compressed measurements. This problem is often ill-posed for being under-determined with multiple interchangeably consistent solutions. The best solution inherently depends on prior knowledge or assumptions, such as the sparsity of the image. Furthermore, the reconstruction pr… ▽ More Inverse imaging problems (IIPs) arise in various applications, with the main objective of reconstructing an image from its compressed measurements. This problem is often ill-posed for being under-determined with multiple interchangeably consistent solutions. The best solution inherently depends on prior knowledge or assumptions, such as the sparsity of the image. Furthermore, the reconstruction process for most IIPs relies significantly on the imaging (i.e. forward model) parameters, which might not be fully known, or the measurement device may undergo calibration drifts. These uncertainties in the forward model create substantial challenges, where inaccurate reconstructions usually happen when the postulated parameters of the forward model do not fully match the actual ones. In this work, we devoted to tackling accurate reconstruction under the context of a set of possible forward model parameters that exist. Here, we propose a novel Moment-Aggregation (MA) framework that is compatible with the popular IIP solution by using a neural network prior. Specifically, our method can reconstruct the signal by considering all candidate parameters of the forward model simultaneously during the update of the neural network. We theoretically demonstrate the convergence of the MA framework, which has a similar complexity with reconstruction under the known forward model parameters. Proof-of-concept experiments demonstrate that the proposed MA achieves performance comparable to the forward model with the known precise parameter in reconstruction across both compressive sensing and phase retrieval applications, with a PSNR gap of 0.17 to 1.94 over various datasets, including MNIST, X-ray, Glas, and MoNuseg. This highlights our method's significant potential in reconstruction under an uncertain forward model. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: Accepted by PBDL-CVPR 2024

arXiv:2405.02188 [pdf, other]

Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes

Authors: Sang Bin Moon, Abolfazl Hashemi

Abstract: The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not… ▽ More The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon develo** a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01011 [pdf, ps, other]

Rare Collision Risk Estimation of Autonomous Vehicles with Multi-Agent Situation Awareness

Authors: Mahdieh Zaker, Henk A. P. Blom, Sadegh Soudjani, Abolfazl Lavaei

Abstract: This paper offers a formal framework for the rare collision risk estimation of autonomous vehicles (AVs) with multi-agent situation awareness, affected by different sources of noise in a complex dynamic environment. In our proposed setting, the situation awareness is considered for one of the ego vehicles by aggregating a range of diverse information gathered from other vehicles into a vector. We… ▽ More This paper offers a formal framework for the rare collision risk estimation of autonomous vehicles (AVs) with multi-agent situation awareness, affected by different sources of noise in a complex dynamic environment. In our proposed setting, the situation awareness is considered for one of the ego vehicles by aggregating a range of diverse information gathered from other vehicles into a vector. We model AVs equipped with the situation awareness as general stochastic hybrid systems (GSHS) and assess the probability of collision in a lane-change scenario where two self-driving vehicles simultaneously intend to switch lanes into a shared one, while utilizing the time-to-collision measure for decision-making as required. Due to the substantial data requirements of simulation-based methods for the rare collision risk estimation, we leverage a multi-level importance splitting technique, known as interacting particle system-based estimation with fixed assignment splitting (IPS-FAS). This approach allows us to estimate the probability of a rare event by employing a group of interacting particles. Specifically, each particle embodies a system trajectory and engages with others through resampling and branching, focusing computational resources on trajectories with the highest probability of encountering the rare event. The effectiveness of our proposed approach is demonstrated through an extensive simulation of a lane-change scenario. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00328 [pdf, other]

Discrete Time Crystal Phase as a Resource for Quantum Enhanced Sensing

Authors: Rozhin Yousefjani, Krzysztof Sacha, Abolfazl Bayat

Abstract: Discrete time crystals are a special phase of matter in which time translational symmetry is broken through a periodic driving pulse. Here, we first propose and characterize an effective mechanism to generate a stable discrete time crystal phase in a disorder-free many-body system with indefinite persistent oscillations even in finite-size systems. Then we explore the sensing capability of this sy… ▽ More Discrete time crystals are a special phase of matter in which time translational symmetry is broken through a periodic driving pulse. Here, we first propose and characterize an effective mechanism to generate a stable discrete time crystal phase in a disorder-free many-body system with indefinite persistent oscillations even in finite-size systems. Then we explore the sensing capability of this system to measure the spin exchange coupling. The results show strong quantum-enhanced sensitivity throughout the time crystal phase. As the spin exchange coupling varies, the system goes through a sharp phase transition and enters a non-time crystal phase in which the performance of the probe considerably decreases. We characterize this phase transition as a second-order type and determine its critical properties through a comprehensive finite-size scaling analysis. The performance of our probe is independent of the initial states and may even benefit from imperfections in the driving pulse. △ Less

Submitted 7 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: Comments are welcome!

arXiv:2404.18937 [pdf, ps, other]

Unification of the Gauge Theories

Authors: Abolfazl Jafari

Abstract: We take the Christoffel coefficients as an operator and introduce new map**s for quaternionic products to reach the theory of electrodynamics in general spacetime. With the help of the directional operator of the covariant derivative, we generalize the quaternioic mechanism to the theory of gravity and show that the Einstein equation has the freedom to choose the constant term in agreement with… ▽ More We take the Christoffel coefficients as an operator and introduce new map**s for quaternionic products to reach the theory of electrodynamics in general spacetime. With the help of the directional operator of the covariant derivative, we generalize the quaternioic mechanism to the theory of gravity and show that the Einstein equation has the freedom to choose the constant term in agreement with the covariant derivative. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 8 pages, 0 figures

arXiv:2404.18628 [pdf, other]

Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction

Authors: Antoine Maiorca, Seyed Abolfazl Ghasemzadeh, Thierry Ravet, François Cresson, Thierry Dutoit, Christophe De Vleeschouwer

Abstract: Virtual Reality (VR) applications have revolutionized user experiences by immersing individuals in interactive 3D environments. These environments find applications in numerous fields, including healthcare, education, or architecture. A significant aspect of VR is the inclusion of self-avatars, representing users within the virtual world, which enhances interaction and embodiment. However, generat… ▽ More Virtual Reality (VR) applications have revolutionized user experiences by immersing individuals in interactive 3D environments. These environments find applications in numerous fields, including healthcare, education, or architecture. A significant aspect of VR is the inclusion of self-avatars, representing users within the virtual world, which enhances interaction and embodiment. However, generating lifelike full-body self-avatar animations remains challenging, particularly in consumer-grade VR systems, where lower-body tracking is often absent. One method to tackle this problem is by providing an external source of motion information that includes lower body information such as full Cartesian positions estimated from RGB(D) cameras. Nevertheless, the limitations of these systems are multiples: the desynchronization between the two motion sources and occlusions are examples of significant issues that hinder the implementations of such systems. In this paper, we aim to measure the impact on the reconstruction of the articulated self-avatar's full-body pose of (1) the latency between the VR motion features and estimated positions, (2) the data acquisition rate, (3) occlusions, and (4) the inaccuracy of the position estimation algorithm. In addition, we analyze the motion reconstruction errors using ground truth and 3D Cartesian coordinates estimated from \textit{YOLOv8} pose estimation. These analyzes show that the studied methods are significantly sensitive to any degradation tested, especially regarding the velocity reconstruction error. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 8 pages, 5 figures and 1 table

arXiv:2404.17745 [pdf]

An Attention-Based Deep Learning Architecture for Real-Time Monocular Visual Odometry: Applications to GPS-free Drone Navigation

Authors: Olivier Brochu Dufour, Abolfazl Mohebbi, Sofiane Achiche

Abstract: Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer fr… ▽ More Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer from error accumulation and high computational demands. Recent studies utilizing deep neural networks (DNNs) have shown improved performance, addressing these drawbacks. Deep visual odometry typically employs convolutional neural networks (CNNs) and sequence modeling networks like recurrent neural networks (RNNs) to interpret scenes and deduce visual odometry from video sequences. This paper presents a novel real-time monocular visual odometry model for drones, using a deep neural architecture with a self-attention module. It estimates the ego-motion of a camera on a drone, using consecutive video frames. An inference utility processes the live video feed, employing deep learning to estimate the drone's trajectory. The architecture combines a CNN for image feature extraction and a long short-term memory (LSTM) network with a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model converged 48% faster than a previous RNN model and showed a 22% reduction in mean translational drift and a 12% improvement in mean translational absolute trajectory error, demonstrating enhanced robustness to noise. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 22 Pages, 3 Tables, 9 Figures

arXiv:2404.17031 [pdf, other]

Motor Focus: Ego-Motion Prediction with All-Pixel Matching

Authors: Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

Abstract: Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these… ▽ More Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these solutions to human users since human navigation introduces unique challenges, including the unpredictable nature of human movement, limited processing capabilities of portable devices, and the need for directional responsiveness due to the limited perception range of humans. In this project, we introduce an image-only method that applies motion analysis using optical flow with ego-motion compensation to predict Motor Focus-where and how humans or machines focus their movement intentions. Meanwhile, this paper addresses the camera shaking issue in handheld and body-mounted devices which can severely degrade performance and accuracy, by applying a Gaussian aggregation to stabilize the predicted motor focus area and enhance the prediction accuracy of movement direction. This also provides a robust, real-time solution that adapts to the user's immediate environment. Furthermore, in the experiments part, we show the qualitative analysis of motor focus estimation between the conventional dense optical flow-based method and the proposed method. In quantitative tests, we show the performance of the proposed method on a collected small dataset that is specialized for motor focus estimation tasks. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16787 [pdf, other]

Enhancing Quality of Experience in Telecommunication Networks: A Review of Frameworks and Machine Learning Algorithms

Authors: Parsa H. S. Panahi, Amir H. Jalilvand, Abolfazl Diyanat

Abstract: The Internet service provider industry is currently experiencing intense competition as companies strive to provide top-notch services to their customers. Providers are introducing cutting-edge technologies to enhance service quality, understanding that their survival depends on the level of service they offer. However, evaluating service quality is a complex task. A crucial aspect of this evaluat… ▽ More The Internet service provider industry is currently experiencing intense competition as companies strive to provide top-notch services to their customers. Providers are introducing cutting-edge technologies to enhance service quality, understanding that their survival depends on the level of service they offer. However, evaluating service quality is a complex task. A crucial aspect of this evaluation lies in understanding user experience, which significantly impacts the success and reputation of a service or product. Ensuring a seamless and positive user experience is essential for attracting and retaining customers. To date, much effort has been devoted to develo** tools for measuring Quality of Experience (QoE), which incorporate both subjective and objective criteria. These tools, available in closed and open-source formats, are accessible to organizations and contribute to improving user experience quality. This review article delves into recent research and initiatives aimed at creating frameworks for assessing user QoE. It also explores the integration of machine learning algorithms to enhance these tools for future advancements. Additionally, the article examines current challenges and envisions future directions in the development of these measurement tools. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 13 pages and 16 figures

arXiv:2404.14804 [pdf, ps, other]

PRoTECT: Parallelized Construction of Safety Barrier Certificates for Nonlinear Polynomial Systems

Authors: Ben Wooding, Viacheslav Horbanov, Abolfazl Lavaei

Abstract: We develop an open-source software tool, called PRoTECT, for the parallelized construction of safety barrier certificates (BCs) for nonlinear polynomial systems. This tool employs sum-of-squares (SOS) optimization programs to systematically search for polynomial-type BCs, while aiming to verify safety properties over four classes of dynamical systems: (i) discrete-time stochastic systems, (ii) dis… ▽ More We develop an open-source software tool, called PRoTECT, for the parallelized construction of safety barrier certificates (BCs) for nonlinear polynomial systems. This tool employs sum-of-squares (SOS) optimization programs to systematically search for polynomial-type BCs, while aiming to verify safety properties over four classes of dynamical systems: (i) discrete-time stochastic systems, (ii) discrete-time deterministic systems, (iii) continuous-time stochastic systems, and (iv) continuous-time deterministic systems. PRoTECT is implemented in Python as an application programming interface (API), offering users the flexibility to interact either through its user-friendly graphic user interface (GUI) or via function calls from other Python programs. PRoTECT leverages parallelism across different barrier degrees to efficiently search for a feasible BC. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.12241 [pdf, other]

Introducing v0.5 of the AI Safety Benchmark from MLCommons

Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-purpose assistant in English), and a limited set of personas (i.e., typical users, malicious users, and vulnerable users). We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0.5 benchmark. We plan to release version 1.0 of the AI Safety Benchmark by the end of 2024. The v1.0 benchmark will provide meaningful insights into the safety of AI systems. However, the v0.5 benchmark should not be used to assess the safety of AI systems. We have sought to fully document the limitations, flaws, and challenges of v0.5. This release of v0.5 of the AI Safety Benchmark includes (1) a principled approach to specifying and constructing the benchmark, which comprises use cases, types of systems under test (SUTs), language and context, personas, tests, and test items; (2) a taxonomy of 13 hazard categories with definitions and subcategories; (3) tests for seven of the hazard categories, each comprising a unique set of test items, i.e., prompts. There are 43,090 test items in total, which we created with templates; (4) a grading system for AI systems against the benchmark; (5) an openly available platform, and downloadable tool, called ModelBench that can be used to evaluate the safety of AI systems on the benchmark; (6) an example evaluation report which benchmarks the performance of over a dozen openly available chat-tuned language models; (7) a test specification for the benchmark. △ Less

Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11782 [pdf, other]

REQUAL-LM: Reliability and Equity through Aggregation in Large Language Models

Authors: Sana Ebrahimi, Nima Shahbazi, Abolfazl Asudeh

Abstract: The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary b… ▽ More The extensive scope of large language models (LLMs) across various domains underscores the critical importance of responsibility in their application, beyond natural language processing. In particular, the randomized nature of LLMs, coupled with inherent biases and historical stereotypes in data, raises critical concerns regarding reliability and equity. Addressing these challenges are necessary before using LLMs for applications with societal impact. Towards addressing this gap, we introduce REQUAL-LM, a novel method for finding reliable and equitable LLM outputs through aggregation. Specifically, we develop a Monte Carlo method based on repeated sampling to find a reliable output close to the mean of the underlying distribution of possible outputs. We formally define the terms such as reliability and bias, and design an equity-aware aggregation to minimize harmful bias while finding a highly reliable output. REQUAL-LM does not require specialized hardware, does not impose a significant computing load, and uses LLMs as a blackbox. This design choice enables seamless scalability alongside the rapid advancement of LLM technologies. Our system does not require retraining the LLMs, which makes it deployment ready and easy to adapt. Our comprehensive experiments using various tasks and datasets demonstrate that REQUAL- LM effectively mitigates bias and selects a more equitable response, specifically the outputs that properly represents minority groups. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.11499 [pdf, other]

A Data-Driven Representation for Sign Language Production

Authors: Harry Walsh, Abolfazl Ravanshad, Mariam Rahmani, Richard Bowden

Abstract: Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous s… ▽ More Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous sequences of sign language. However, current state-of-the-art approaches rely on scarce linguistic resources to work. This has limited progress in the field. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem. Thus, overcoming the need for costly annotation. Although, if available, we leverage the additional information to enhance our approach. By applying Vector Quantisation (VQ) to sign language data, we first learn a codebook of short motions that can be combined to create a natural sequence of sign. Where each token in the codebook can be thought of as the lexicon of our representation. Then using a transformer we perform a translation from spoken language text to a sequence of codebook tokens. Each token can be directly mapped to a sequence of poses allowing the translation to be performed by a single network. Furthermore, we present a sign stitching method to effectively join tokens together. We evaluate on the RWTH-PHOENIX-Weather-2014T (PHOENIX14T) and the more challenging Meine DGS Annotated (mDGS) datasets. An extensive evaluation shows our approach outperforms previous methods, increasing the BLEU-1 back translation score by up to 72%. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 8 Pages, 3 Figures, 7 Tables, 18th IEEE International Conference on Automatic Face and Gesture Recognition 2024

arXiv:2404.11335 [pdf, other]

SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap

Authors: Vladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir Mohammad Mansourian, Xin Zhou, Shohreh Kasaei, Bernard Ghanem, Alexandre Alahi, Marc Van Droogenbroeck, Christophe De Vleeschouwer

Abstract: Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, r… ▽ More Tracking and identifying athletes on the pitch holds a central role in collecting essential insights from the game, such as estimating the total distance covered by players or understanding team tactics. This tracking and identification process is crucial for reconstructing the game state, defined by the athletes' positions and identities on a 2D top-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a single camera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work, we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSR is composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization and camera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novel metric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline for game state reconstruction, bootstrap** the research on this task. Our experiments show that GSR is a challenging novel task, which opens the field for future research. Our dataset and codebase are publicly available at https://github.com/SoccerNet/sn-gamestate. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10382 [pdf, other]

Nonlinearity-enhanced quantum sensing in Stark probes

Authors: Rozhin Yousefjani, Xingjian He, Angelo Carollo, Abolfazl Bayat

Abstract: Stark systems in which a linear gradient field is applied across a many-body system have recently been harnessed for quantum sensing. Here, we explore sensing capacity of Stark models, in both single-particle and many-body interacting systems, for estimating the strength of both linear and nonlinear Stark fields. The problem naturally lies in the context of multi-parameter estimation. We determine… ▽ More Stark systems in which a linear gradient field is applied across a many-body system have recently been harnessed for quantum sensing. Here, we explore sensing capacity of Stark models, in both single-particle and many-body interacting systems, for estimating the strength of both linear and nonlinear Stark fields. The problem naturally lies in the context of multi-parameter estimation. We determine the phase diagram of the system in terms of both linear and nonlinear gradient fields showing how the extended phase turns into a localized one as the Stark fields increase. We also characterize the properties of the phase transition, including critical exponents, through a comprehesive finite-size scaling analysis. Interestingly, our results show that the estimation of both the linear and the nonlinear fields can achieve super-Heisenberg scaling. In fact, the scaling exponent of the sensing precision is directly proportional to the nonlinearity exponent which shows that nonlinearity enhances the estimation precision. Finally, we show that even after considering the cost of the preparation time the sensing precision still reveals super-Heisenberg scaling. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures. Comments are welcome!

arXiv:2404.08013 [pdf, other]

Enhanced Cooperative Perception for Autonomous Vehicles Using Imperfect Communication

Authors: Ahmad Sarlak, Hazim Alzorgan, Sayed Pedram Haeri Boroujeni, Abolfazl Razi, Rahul Amin

Abstract: Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded tr… ▽ More Sharing and joint processing of camera feeds and sensor measurements, known as Cooperative Perception (CP), has emerged as a new technique to achieve higher perception qualities. CP can enhance the safety of Autonomous Vehicles (AVs) where their individual visual perception quality is compromised by adverse weather conditions (haze as foggy weather), low illumination, winding roads, and crowded traffic. To cover the limitations of former methods, in this paper, we propose a novel approach to realize an optimized CP under constrained communications. At the core of our approach is recruiting the best helper from the available list of front vehicles to augment the visual range and enhance the Object Detection (OD) accuracy of the ego vehicle. In this two-step process, we first select the helper vehicles that contribute the most to CP based on their visual range and lowest motion blur. Next, we implement a radio block optimization among the candidate vehicles to further improve communication efficiency. We specifically focus on pedestrian detection as an exemplary scenario. To validate our approach, we used the CARLA simulator to create a dataset of annotated videos for different driving scenarios where pedestrian detection is challenging for an AV with compromised vision. Our results demonstrate the efficacy of our two-step optimization process in improving the overall performance of cooperative perception in challenging scenarios, substantially improving driving safety under adverse conditions. Finally, we note that the networking assumptions are adopted from LTE Release 14 Mode 4 side-link communication, commonly used for Vehicle-to-Vehicle (V2V) communication. Nonetheless, our method is flexible and applicable to arbitrary V2V communications. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.08003 [pdf, other]

Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton

Abstract: To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effe… ▽ More To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effectively handle the heterogeneous arrival times of policy gradients. We analyze the theoretical global convergence bound of AFedPG, and characterize the advantage of the proposed algorithm in terms of both the sample complexity and time complexity. Specifically, our AFedPG method achieves $\mathcal{O}(\frac{ε^{-2.5}}{N})$ sample complexity at each agent on average. Compared to the single agent setting with $\mathcal{O}(ε^{-2.5})$ sample complexity, it enjoys a linear speedup with respect to the number of agents. Moreover, compared to synchronous FedPG, AFedPG improves the time complexity from $\mathcal{O}(\frac{t_{\max}}{N})$ to $\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$, where $t_{i}$ denotes the time consumption in each iteration at the agent $i$, and $t_{\max}$ is the largest one. The latter complexity $\mathcal{O}(\frac{1}{\sum_{i=1}^{N} \frac{1}{t_{i}}})$ is always smaller than the former one, and this improvement becomes significant in large-scale federated settings with heterogeneous computing powers ($t_{\max}\gg t_{\min}$). Finally, we empirically verify the improved performances of AFedPG in three MuJoCo environments with varying numbers of agents. We also demonstrate the improvements with different computing heterogeneity. △ Less

Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

ACM Class: I.2.6; I.2.11

arXiv:2404.07579 [pdf, other]

Revisiting Data Recovery Loops in 6G Networks

Authors: Uyoata E. Uyoata, Abolfazl Amiri, Enric Juan, Guillermo Pocovi, Pilar Andres-Maldonado, Klaus I. Pedersen, Troels Kolding

Abstract: Mechanisms for data recovery and packet reliability are essential components of the upcoming 6th generation (6G) communication system. In this paper, we evaluate the interaction between a fast hybrid automatic repeat request (HARQ) scheme, present in the physical and medium access control layers, and a higher layer automatic repeat request (ARQ) scheme which may be present in the radio link contro… ▽ More Mechanisms for data recovery and packet reliability are essential components of the upcoming 6th generation (6G) communication system. In this paper, we evaluate the interaction between a fast hybrid automatic repeat request (HARQ) scheme, present in the physical and medium access control layers, and a higher layer automatic repeat request (ARQ) scheme which may be present in the radio link control layer. Through extensive system-level simulations, we show that despite its higher complexity, a fast HARQ scheme yields > 66 % downlink average user throughput gains over simpler solutions without energy combining gains and orders of magnitude larger gains for users in challenging radio conditions. We present results for the design trade-off between HARQ and higher-layer data recovery mechanisms in the presence of realistic control and data channel errors, network delays, and transport protocols. We derive that, with a suitable design of 6G control and data channels reaching residual errors at the medium access control layer of 5 E-5 or better, a higher layer data recovery mechanism can be disabled. We then derive design targets for 6G control channel design, as well as promising enhancements to 6G higher layer data recovery to extend support for latency-intolerant services. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted for publication in EUCNC 2024

arXiv:2404.07354 [pdf, other]

FairEM360: A Suite for Responsible Entity Matching

Authors: Nima Shahbazi, Mahdi Erfanian, Abolfazl Asudeh, Fatemeh Nargesian, Divesh Srivastava

Abstract: Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the o… ▽ More Entity matching is one the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the matcher at this stage can contribute to promoting fairness in downstream tasks. This demonstration showcases FairEM360, a framework for 1) auditing the output of entity matchers across a wide range of fairness measures and paradigms, 2) providing potential explanations for the underlying reasons for unfairness, and 3) providing resolutions for the unfairness issues through an exploratory process with human-in-the-loop feedback, utilizing an ensemble of matchers. We aspire for FairEM360 to contribute to the prioritization of fairness as a key consideration in the evaluation of EM pipelines. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.05919 [pdf, other]

AdaGossip: Adaptive Consensus Step-size for Decentralized Deep Learning with Communication Compression

Authors: Sai Aparna Aketi, Abolfazl Hashemi, Kaushik Roy

Abstract: Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature… ▽ More Decentralized learning is crucial in supporting on-device learning over large distributed datasets, eliminating the need for a central server. However, the communication overhead remains a major bottleneck for the practical realization of such decentralized setups. To tackle this issue, several algorithms for decentralized training with compressed communication have been proposed in the literature. Most of these algorithms introduce an additional hyper-parameter referred to as consensus step-size which is tuned based on the compression ratio at the beginning of the training. In this work, we propose AdaGossip, a novel technique that adaptively adjusts the consensus step-size based on the compressed model differences between neighboring agents. We demonstrate the effectiveness of the proposed method through an exhaustive set of experiments on various Computer Vision datasets (CIFAR-10, CIFAR-100, Fashion MNIST, Imagenette, and ImageNet), model architectures, and network topologies. Our experiments show that the proposed method achieves superior performance ($0-2\%$ improvement in test accuracy) compared to the current state-of-the-art method for decentralized learning with communication compression. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 11 pages, 3 figures, 8 tables. arXiv admin note: text overlap with arXiv:2305.04792, arXiv:2310.15890

arXiv:2404.03759 [pdf, other]

Localized Distributional Robustness in Submodular Multi-Task Subset Selection

Authors: Ege C. Kaya, Abolfazl Hashemi

Abstract: In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duali… ▽ More In this work, we approach the problem of multi-task submodular optimization with the perspective of local distributional robustness, within the neighborhood of a reference distribution which assigns an importance score to each task. We initially propose to introduce a regularization term which makes use of the relative entropy to the standard multi-task objective. We then demonstrate through duality that this novel formulation itself is equivalent to the maximization of a submodular function, which may be efficiently carried out through standard greedy selection methods. This approach bridges the existing gap in the optimization of performance-robustness trade-offs in multi-task subset selection. To numerically validate our theoretical results, we test the proposed method in two different setting, one involving the selection of satellites in low Earth orbit constellations in the context of a sensor selection problem, and the other involving an image summarization task using neural networks. Our method is compared with two other algorithms focused on optimizing the performance of the worst-case task, and on directly optimizing the performance on the reference distribution itself. We conclude that our novel formulation produces a solution that is locally distributional robust, and computationally inexpensive. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 35 pages, 7 figures. A preliminary version of this article was presented at the 2023 Allerton Conference on Communication, Control, and Computing. This version was submitted to IEEE Transactions on Signal Processing

arXiv:2404.03740 [pdf, other]

Randomized Greedy Methods for Weak Submodular Sensor Selection with Robustness Considerations

Authors: Ege C. Kaya, Michael Hibbard, Takashi Tanaka, Ufuk Topcu, Abolfazl Hashemi

Abstract: We study a pair of budget- and performance-constrained weak submodular maximization problems. For computational efficiency, we explore the use of stochastic greedy algorithms which limit the search space via random sampling instead of the standard greedy procedure which explores the entire feasible search space. We propose a pair of stochastic greedy algorithms, namely, Modified Randomized Greedy… ▽ More We study a pair of budget- and performance-constrained weak submodular maximization problems. For computational efficiency, we explore the use of stochastic greedy algorithms which limit the search space via random sampling instead of the standard greedy procedure which explores the entire feasible search space. We propose a pair of stochastic greedy algorithms, namely, Modified Randomized Greedy (MRG) and Dual Randomized Greedy (DRG) to approximately solve the budget- and performance-constrained problems, respectively. For both algorithms, we derive approximation guarantees that hold with high probability. We then examine the use of DRG in robust optimization problems wherein the objective is to maximize the worst-case of a number of weak submodular objectives and propose the Randomized Weak Submodular Saturation Algorithm (Random-WSSA). We further derive a high-probability guarantee for when Random-WSSA successfully constructs a robust solution. Finally, we showcase the effectiveness of these algorithms in a variety of relevant uses within the context of Earth-observing LEO constellations which estimate atmospheric weather conditions and provide Earth coverage. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 36 pages, 5 figures. A preliminary version of this article was presented at the 2023 American Control Conference (ACC). This version was submitted to Automatica

arXiv:2403.17331 [pdf, other]

FedMIL: Federated-Multiple Instance Learning for Video Analysis with Optimized DPP Scheduling

Authors: Ashish Bastola, Hao Wang, Xiwen Chen, Abolfazl Razi

Abstract: Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. M… ▽ More Many AI platforms, including traffic monitoring systems, use Federated Learning (FL) for decentralized sensor data processing for learning-based applications while preserving privacy and ensuring secured information transfer. On the other hand, applying supervised learning to large data samples, like high-resolution images requires intensive human labor to label different parts of a data sample. Multiple Instance Learning (MIL) alleviates this challenge by operating over labels assigned to the 'bag' of instances. In this paper, we introduce Federated Multiple-Instance Learning (FedMIL). This framework applies federated learning to boost the training performance in video-based MIL tasks such as vehicle accident detection using distributed CCTV networks. However, data sources in decentralized settings are not typically Independently and Identically Distributed (IID), making client selection imperative to collectively represent the entire dataset with minimal clients. To address this challenge, we propose DPPQ, a framework based on the Determinantal Point Process (DPP) with a quality-based kernel to select clients with the most diverse datasets that achieve better performance compared to both random selection and current DPP-based client selection methods even with less data utilization in the majority of non-IID cases. This offers a significant advantage for deployment on edge devices with limited computational resources, providing a reliable solution for training AI models in massive smart sensor networks. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13247 [pdf, other]

FedNMUT -- Federated Noisy Model Update Tracking Convergence Analysis

Authors: Vishnu Pandi Chellapandi, Antesh Upadhyay, Abolfazl Hashemi, Stanislaw H. Żak

Abstract: A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed that is tailored to function efficiently in the presence of noisy communication channels that reflect imperfect information exchange. This algorithm uses gradient tracking to minimize the impact of data heterogeneity while minimizing communication overhead. The proposed algorithm incorporates noise… ▽ More A novel Decentralized Noisy Model Update Tracking Federated Learning algorithm (FedNMUT) is proposed that is tailored to function efficiently in the presence of noisy communication channels that reflect imperfect information exchange. This algorithm uses gradient tracking to minimize the impact of data heterogeneity while minimizing communication overhead. The proposed algorithm incorporates noise into its parameters to mimic the conditions of noisy communication channels, thereby enabling consensus among clients through a communication graph topology in such challenging environments. FedNMUT prioritizes parameter sharing and noise incorporation to increase the resilience of decentralized learning systems against noisy communications. Theoretical results for the smooth non-convex objective function are provided by us, and it is shown that the $ε-$stationary solution is achieved by our algorithm at the rate of $\mathcal{O}\left(\frac{1}{\sqrt{T}}\right)$, where $T$ is the total number of communication rounds. Additionally, via empirical validation, we demonstrated that the performance of FedNMUT is superior to the existing state-of-the-art methods and conventional parameter-mixing approaches in dealing with imperfect information sharing. This proves the capability of the proposed algorithm to counteract the negative effects of communication noise in a decentralized learning framework. △ Less

Submitted 24 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2303.10695

arXiv:2403.12415 [pdf, other]

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

Authors: Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi

Abstract: This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered… ▽ More This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered descriptions emphasizing abnormalities, assist in safe visual navigation in complex circumstances. Moreover, our proposed framework leverages the advantages of LLMs and the open-vocabulary object detection model to achieve the dynamic scenario switch, which allows users to transition smoothly from scene to scene, which addresses the limitation of traditional visual navigation. Furthermore, this paper explored the performance contribution of different prompt components, provided the vision for future improvement in visual accessibility, and paved the way for LLMs in video anomaly detection and vision-language understanding. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12056 [pdf, other]

Enhancing Digital Hologram Reconstruction Using Reverse-Attention Loss for Untrained Physics-Driven Deep Learning Models with Uncertain Distance

Authors: Xiwen Chen, Hao Wang, Zhao Zhang, Zhenmin Li, Huayu Li, Tong Ye, Abolfazl Razi

Abstract: Untrained Physics-based Deep Learning (DL) methods for digital holography have gained significant attention due to their benefits, such as not requiring an annotated training dataset, and providing interpretability since utilizing the governing laws of hologram formation. However, they are sensitive to the hard-to-obtain precise object distance from the imaging plane, posing the… ▽ More Untrained Physics-based Deep Learning (DL) methods for digital holography have gained significant attention due to their benefits, such as not requiring an annotated training dataset, and providing interpretability since utilizing the governing laws of hologram formation. However, they are sensitive to the hard-to-obtain precise object distance from the imaging plane, posing the $\textit{Autofocusing}$ challenge. Conventional solutions involve reconstructing image stacks for different potential distances and applying focus metrics to select the best results, which apparently is computationally inefficient. In contrast, recently developed DL-based methods treat it as a supervised task, which again needs annotated data and lacks generalizability. To address this issue, we propose $\textit{reverse-attention loss}$, a weighted sum of losses for all possible candidates with learnable weights. This is a pioneering approach to addressing the Autofocusing challenge in untrained deep-learning methods. Both theoretical analysis and experiments demonstrate its superiority in efficiency and accuracy. Interestingly, our method presents a significant reconstruction performance over rival methods (i.e. alternating descent-like optimization, non-weighted loss integration, and random distance assignment) and even is almost equal to that achieved with a precisely known object distance. For example, the difference is less than 1dB in PSNR and 0.002 in SSIM for the target sample in our experiment. △ Less

Submitted 10 January, 2024; originally announced March 2024.

arXiv:2403.10084 [pdf, other]

Sequential measurements thermometry with quantum many-body probes

Authors: Yaoling Yang, Victor Montenegro, Abolfazl Bayat

Abstract: Measuring the temperature of a quantum system is an essential task in almost all aspects of quantum technologies. Theoretically, an optimal strategy for thermometry requires measuring energy which demands full accessibility over the entire system as well as complex entangled measurement basis. In this paper, we take a different approach and show that single qubit sequential measurements in the com… ▽ More Measuring the temperature of a quantum system is an essential task in almost all aspects of quantum technologies. Theoretically, an optimal strategy for thermometry requires measuring energy which demands full accessibility over the entire system as well as complex entangled measurement basis. In this paper, we take a different approach and show that single qubit sequential measurements in the computational basis not only allows precise thermometry of a many-body system but may also achieve precision beyond the theoretical bound, avoiding demanding energy measurements at equilibrium. To obtain such precision, the time between the two subsequent measurements should be smaller than the thermalization time so that the probe never thermalizes. Therefore, the non-equilibrium dynamics of the system continuously imprint information about temperature in the state of the probe. This allows the sequential measurement scheme to reach precision beyond the accuracy achievable by complex energy measurements on equilibrium probes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 9 pages main text, 8 figures. Feedback is welcome!

arXiv:2403.09646 [pdf, other]

On Unsupervised Image-to-image translation and GAN stability

Authors: BahaaEddin AlAila, Zahra Jandaghi, Abolfazl Farahani, Mohammad Ziad Al-Saad

Abstract: The problem of image-to-image translation is one that is intruiging and challenging at the same time, for the impact potential it can have on a wide variety of other computer vision applications like colorization, inpainting, segmentation and others. Given the high-level of sophistication needed to extract patterns from one domain and successfully applying them to another, especially, in a complet… ▽ More The problem of image-to-image translation is one that is intruiging and challenging at the same time, for the impact potential it can have on a wide variety of other computer vision applications like colorization, inpainting, segmentation and others. Given the high-level of sophistication needed to extract patterns from one domain and successfully applying them to another, especially, in a completely unsupervised (unpaired) manner, this problem has gained much attention as of the last few years. It is one of the first problems where successful applications to deep generative models, and especially Generative Adversarial Networks achieved astounding results that are actually of realworld impact, rather than just a show of theoretical prowess; the such that has been dominating the GAN world. In this work, we study some of the failure cases of a seminal work in the field, CycleGAN [1] and hypothesize that they are GAN-stability related, and propose two general models to try to alleviate these problems. We also reach the same conclusion of the problem being ill-posed that has been also circulating in the literature lately. △ Less

Submitted 18 October, 2023; originally announced March 2024.

arXiv:2403.04694 [pdf, other]

On $[1,2]$-Domination in Interval and Circle Graphs

Authors: Mohsen Alambardar Meybodi, Abolfazl Poureidi

Abstract: A subset $S$ of vertices in a graph $G=(V, E)$ is Dominating Set if each vertex in $V(G)\setminus S$ is adjacent to at least one vertex in $S$. Chellali et al. in 2013, by restricting the number of neighbors in $S$ of a vertex outside $S$, introduced the concept of $[1,j]$-dominating set. A set $D \subseteq V$ of a graph $G = (V, E)$ is called $[1,j]$-Dominating Set of $G$ if every vertex not in… ▽ More A subset $S$ of vertices in a graph $G=(V, E)$ is Dominating Set if each vertex in $V(G)\setminus S$ is adjacent to at least one vertex in $S$. Chellali et al. in 2013, by restricting the number of neighbors in $S$ of a vertex outside $S$, introduced the concept of $[1,j]$-dominating set. A set $D \subseteq V$ of a graph $G = (V, E)$ is called $[1,j]$-Dominating Set of $G$ if every vertex not in $D$ has at least one neighbor and at most $j$ neighbors in $D$. The Minimum $[1,j]$-Domination problem is the problem of finding the minimum set $D$. Given a positive integer $k$ and a graph $G = (V, E)$, the $[1,j]$-Domination Decision problem is to decide whether $G$ has $[1,j]$-dominating set of cardinality at most $k$. A polynomial-time algorithm was obtained in split graphs for a constant $j$ in contrast to the classic Dominating Set problem which is NP-hard in split graphs. This result motivates us to investigate the effect of restriction $j$ on the complexity of $[1,j]$-domination problem on various classes of graphs. Although for $j\geq 3$, it has been proved that the minimum of classical domination is equal to minimum $[1,j]$-domination in interval graphs, the complexity of finding the minimum $[1,2]$-domination in interval graphs is still outstanding. In this paper, we propose a polynomial-time algorithm for computing a minimum $[1,2]$ on non-proper interval graphs by a dynamic programming technique. Next, on the negative side, we show that the minimum $[1,2]$-dominating set problem on circle graphs is $NP$-complete. △ Less

Submitted 12 June, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03463 [pdf, other]

FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion

Authors: Hao Wang, Sayed Pedram Haeri Boroujeni, Xiwen Chen, Ashish Bastola, Huayu Li, Abolfazl Razi

Abstract: The rise of machine learning in recent years has brought benefits to various research fields such as wide fire detection. Nevertheless, small object detection and rare object detection remain a challenge. To address this problem, we present a dataset automata that can generate ground truth paired datasets using diffusion models. Specifically, we introduce a mask-guided diffusion framework that can… ▽ More The rise of machine learning in recent years has brought benefits to various research fields such as wide fire detection. Nevertheless, small object detection and rare object detection remain a challenge. To address this problem, we present a dataset automata that can generate ground truth paired datasets using diffusion models. Specifically, we introduce a mask-guided diffusion framework that can fusion the wildfire into the existing images while the flame position and size can be precisely controlled. In advance, to fill the gap that the dataset of wildfire images in specific scenarios is missing, we vary the background of synthesized images by controlling both the text prompt and input image. Furthermore, to solve the color tint problem or the well-known domain shift issue, we apply the CLIP model to filter the generated massive dataset to preserve quality. Thus, our proposed framework can generate a massive dataset of that images are high-quality and ground truth-paired, which well addresses the needs of the annotated datasets in specific tasks. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2403.00198 [pdf, other]

AXOLOTL: Fairness through Assisted Self-Debiasing of Large Language Model Outputs

Authors: Sana Ebrahimi, Kaiwen Chen, Abolfazl Asudeh, Gautam Das, Nick Koudas

Abstract: Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we intro… ▽ More Pre-trained Large Language Models (LLMs) have significantly advanced natural language processing capabilities but are susceptible to biases present in their training data, leading to unfair outcomes in various applications. While numerous strategies have been proposed to mitigate bias, they often require extensive computational resources and may compromise model performance. In this work, we introduce AXOLOTL, a novel post-processing framework, which operates agnostically across tasks and models, leveraging public APIs to interact with LLMs without direct access to internal parameters. Through a three-step process resembling zero-shot learning, AXOLOTL identifies biases, proposes resolutions, and guides the model to self-debias its outputs. This approach minimizes computational costs and preserves model performance, making AXOLOTL a promising tool for debiasing LLM outputs with broad applicability and ease of use. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.18726 [pdf, other]

Unveiling Privacy, Memorization, and Input Curvature Links

Authors: Deepak Ravikumar, Efstathia Soufleri, Abolfazl Hashemi, Kaushik Roy

Abstract: Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its p… ▽ More Deep Neural Nets (DNNs) have become a pervasive tool for solving many emerging problems. However, they tend to overfit to and memorize the training set. Memorization is of keen interest since it is closely related to several concepts such as generalization, noisy learning, and privacy. To study memorization, Feldman (2019) proposed a formal score, however its computational requirements limit its practical use. Recent research has shown empirical evidence linking input loss curvature (measured by the trace of the loss Hessian w.r.t inputs) and memorization. It was shown to be ~3 orders of magnitude more efficient than calculating the memorization score. However, there is a lack of theoretical understanding linking memorization with input loss curvature. In this paper, we not only investigate this connection but also extend our analysis to establish theoretical links between differential privacy, memorization, and input loss curvature. First, we derive an upper bound on memorization characterized by both differential privacy and input loss curvature. Second, we present a novel insight showing that input loss curvature is upper-bounded by the differential privacy parameter. Our theoretical findings are further empirically validated using deep models on CIFAR and ImageNet datasets, showing a strong correlation between our theoretical predictions and results observed in practice. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.15490 [pdf, other]

A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends

Authors: Abolfazl Younesi, Mohsen Ansari, MohammadAmin Fazli, Alireza Ejlali, Muhammad Shafique, Jörg Henkel

Abstract: In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NA… ▽ More In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends. △ Less

Submitted 28 February, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

Showing 1–50 of 558 results for author: Abolfazl