Search | arXiv e-print repository

Achieving Observability on Fog Computing with the use of open-source tools

Authors: Breno Costa, Abhik Banerjee, Prem Prakash Jayaraman, Leonardo R. Carvalho, João Bachiega Jr., Aleteia Araujo

Abstract: Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is consider… ▽ More Fog computing can provide computational resources and low-latency communication at the network edge. But with it comes uncertainties that must be managed in order to guarantee Service Level Agreements. Service observability can help the environment better deal with uncertainties, delivering relevant and up-to-date information in a timely manner to support decision making. Observability is considered a superset of monitoring since it uses not only performance metrics, but also other instrumentation domains such as logs and traces. However, as Fog Computing is typically characterised by resource-constrained nodes and network uncertainties, increasing observability in fog can be risky due to the additional load injected into a restricted environment. There is no work in the literature that evaluated fog observability. In this paper, we first outline the challenges of achieving observability in a Fog environment, based on which we present a formal definition of fog observability. Subsequently, a real-world Fog Computing testbed running a smart city use case is deployed, and an empirical evaluation of fog observability using open-source tools is presented. The results show that under certain conditions, it is viable to provide observability in a Fog Computing environment using open-source tools, although it is necessary to control the overhead modifying their default configuration according to the application characteristics. △ Less

Submitted 25 May, 2024; originally announced July 2024.

Comments: Paper presented at Mobiquitous 2023

arXiv:2406.08332 [pdf, other]

UDON: Universal Dynamic Online distillatioN for generic image representations

Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, André Araujo, Ondřej Chum

Abstract: Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large… ▽ More Universal image representations are critical in enabling real-world fine-grained and instance-level recognition applications, where objects and entities from any domain must be identified at large scale. Despite recent advances, existing methods fail to capture important domain-specific knowledge, while also ignoring differences in data distribution across different domains. This leads to a large performance gap between efficient universal solutions and expensive approaches utilising a collection of specialist models, one for each domain. In this work, we make significant strides towards closing this gap, by introducing a new learning technique, dubbed UDON (Universal Dynamic Online DistillatioN). UDON employs multi-teacher distillation, where each teacher is specialized in one domain, to transfer detailed domain-specific knowledge into the student universal embedding. UDON's distillation approach is not only effective, but also very efficient, by sharing most model parameters between the student and all teachers, where all models are jointly trained in an online manner. UDON also comprises a sampling technique which adapts the training process to dynamically allocate batches to domains which are learned slower and require more frequent processing. This boosts significantly the learning of complex domains which are characterised by a large number of classes and long-tail distributions. With comprehensive experiments, we validate each component of UDON, and showcase significant improvements over the state of the art in the recent UnED benchmark. Code: https://github.com/nikosips/UDON . △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.03342 [pdf]

Understanding and measuring software engineer behavior: What can we learn from the behavioral sciences?

Authors: Allysson Allex Araújo, Marcos Kalinowski, Daniel Graziotin

Abstract: This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as… ▽ More This paper explores the intricate challenge of understanding and measuring software engineer behavior. More specifically, we revolve around a central question: How can we enhance our understanding of software engineer behavior? Grounded in the nuanced complexities addressed within Behavioral Software Engineering (BSE), we advocate for holistic methods that integrate quantitative measures, such as psychometric instruments, and qualitative data from diverse sources. Furthermore, we delve into the relevance of this challenge within national and international contexts, highlighting the increasing interest in understanding software engineer behavior. Real-world initiatives and academic endeavors are also examined to underscore the potential for advancing this research agenda and, consequently, refining software engineering practices based on behavioral aspects. Lastly, this paper addresses different ways to evaluate the progress of this challenge by leveraging methodological skills derived from behavioral sciences, ultimately contributing to a deeper understanding of software engineer behavior and software engineering practices. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 6

arXiv:2406.02487 [pdf]

Investigating the Online Recruitment and Selection Journey of Novice Software Engineers: Anti-patterns and Recommendations

Authors: Miguel Setúbal, Tayana Conte, Marcos Kalinowski, Allysson Allex Araújo

Abstract: [Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved,… ▽ More [Context] The growing software development market has increased the demand for qualified professionals in Software Engineering (SE). To this end, companies must enhance their Recruitment and Selection (R&S) processes to maintain high quality teams, including opening opportunities for beginners, such as trainees and interns. However, given the various judgments and sociotechnical factors involved, this complex process of R&S poses a challenge for recent graduates seeking to enter the market. [Objective] This paper aims to identify a set of anti-patterns and recommendations for early career SE professionals concerning R&S processes. [Method] Under an exploratory and qualitative methodological approach, we conducted six online Focus Groups with 18 recruiters with experience in R&S in the software industry. [Results] After completing our qualitative analysis, we identified 12 anti-patterns and 31 actionable recommendations regarding the hiring process focused on entry level SE professionals. The identified anti-patterns encompass behavioral and technical dimensions innate to R&S processes. [Conclusion] These findings provide a rich opportunity for reflection in the SE industry and offer valuable guidance for early-career candidates and organizations. From an academic perspective, this work also raises awareness of the intersection of Human Resources and SE, an area with considerable potential to be expanded in the context of cooperative and human aspects of SE. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 33 pages

arXiv:2405.12979 [pdf, other]

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

Authors: Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo

Abstract: The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue,… ▽ More The image matching field has been witnessing a continuous emergence of novel learnable feature matching techniques, with ever-improving performance on conventional benchmarks. However, our investigation shows that despite these gains, their potential for real-world applications is restricted by their limited generalization capabilities to novel image domains. In this paper, we introduce OmniGlue, the first learnable image matcher that is designed with generalization as a core principle. OmniGlue leverages broad knowledge from a vision foundation model to guide the feature matching process, boosting generalization to domains not seen at training time. Additionally, we propose a novel keypoint position-guided attention mechanism which disentangles spatial and appearance information, leading to enhanced matching descriptors. We perform comprehensive experiments on a suite of $7$ datasets with varied image domains, including scene-level, object-centric and aerial images. OmniGlue's novel components lead to relative gains on unseen domains of $20.9\%$ with respect to a directly comparable reference model, while also outperforming the recent LightGlue method by $9.5\%$ relatively.Code and model can be found at https://hwjiang1510.github.io/OmniGlue △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2404.19174 [pdf, other]

XFeat: Accelerated Features for Lightweight Image Matching

Authors: Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento

Abstract: We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, acc… ▽ More We introduce a lightweight and accurate architecture for resource-efficient visual correspondence. Our method, dubbed XFeat (Accelerated Features), revisits fundamental design choices in convolutional neural networks for detecting, extracting, and matching local features. Our new model satisfies a critical need for fast and robust algorithms suitable to resource-limited devices. In particular, accurate image matching requires sufficiently large image resolutions - for this reason, we keep the resolution as large as possible while limiting the number of channels in the network. Besides, our model is designed to offer the choice of matching at the sparse or semi-dense levels, each of which may be more suitable for different downstream applications, such as visual navigation and augmented reality. Our model is the first to offer semi-dense matching efficiently, leveraging a novel match refinement module that relies on coarse local descriptors. XFeat is versatile and hardware-independent, surpassing current deep learning-based local features in speed (up to 5x faster) with comparable or better accuracy, proven in pose estimation and visual localization. We showcase it running in real-time on an inexpensive laptop CPU without specialized hardware optimizations. Code and weights are available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: CVPR 2024; Source code available at www.verlab.dcc.ufmg.br/descriptors/xfeat_cvpr24

arXiv:2404.05465 [pdf, other]

HAMMR: HierArchical MultiModal React agents for generic VQA

Authors: Lluis Castrejon, Thomas Mensink, Howard Zhou, Vittorio Ferrari, Andre Araujo, Jasper Uijlings

Abstract: Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal probl… ▽ More Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated for each individual benchmark, in practice it is crucial for the next generation of real-world AI systems to handle a broad range of multimodal problems. Therefore we pose the VQA problem from a unified perspective and evaluate a single system on a varied suite of VQA tasks including counting, spatial reasoning, OCR-based reasoning, visual pointing, external knowledge, and more. In this setting, we demonstrate that naively applying the LLM+tools approach using the combined set of all tools leads to poor results. This motivates us to introduce HAMMR: HierArchical MultiModal React. We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents. This enhances the compositionality of the LLM+tools approach, which we show to be critical for obtaining high accuracy on generic VQA. Concretely, on our generic VQA suite, HAMMR outperforms the naive LLM+tools approach by 19.5%. Additionally, HAMMR achieves state-of-the-art results on this task, outperforming the generic standalone PaLI-X VQA model by 5.0%. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.04809 [pdf, other]

Low-Resource Machine Translation through Retrieval-Augmented LLM Prompting: A Study on the Mambai Language

Authors: Raphaël Merx, Aso Mahmudi, Katrina Langford, Leo Alberto de Araujo, Ekaterina Vylomova

Abstract: This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT)… ▽ More This study explores the use of large language models (LLMs) for translating English into Mambai, a low-resource Austronesian language spoken in Timor-Leste, with approximately 200,000 native speakers. Leveraging a novel corpus derived from a Mambai language manual and additional sentences translated by a native speaker, we examine the efficacy of few-shot LLM prompting for machine translation (MT) in this low-resource context. Our methodology involves the strategic selection of parallel sentences and dictionary entries for prompting, aiming to enhance translation accuracy, using open-source and proprietary LLMs (LlaMa 2 70b, Mixtral 8x7B, GPT-4). We find that including dictionary entries in prompts and a mix of sentences retrieved through TF-IDF and semantic embeddings significantly improves translation quality. However, our findings reveal stark disparities in translation performance across test sets, with BLEU scores reaching as high as 21.2 on materials from the language manual, in contrast to a maximum of 4.4 on a test set provided by a native speaker. These results underscore the importance of diverse and representative corpora in assessing MT for low-resource languages. Our research provides insights into few-shot LLM prompting for low-resource MT, and makes available an initial corpus for the Mambai language. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2402.09674 [pdf, other]

PAL: Proxy-Guided Black-Box Attack on Large Language Models

Authors: Chawin Sitawarin, Norman Mu, David Wagner, Alexandre Araujo

Abstract: Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), th… ▽ More Large Language Models (LLMs) have surged in popularity in recent months, but they have demonstrated concerning capabilities to generate harmful content when manipulated. While techniques like safety fine-tuning aim to minimize harmful use, recent works have shown that LLMs remain vulnerable to attacks that elicit toxic responses. In this work, we introduce the Proxy-Guided Attack on LLMs (PAL), the first optimization-based attack on LLMs in a black-box query-only setting. In particular, it relies on a surrogate model to guide the optimization and a sophisticated loss designed for real-world LLM APIs. Our attack achieves 84% attack success rate (ASR) on GPT-3.5-Turbo and 48% on Llama-2-7B, compared to 4% for the current state of the art. We also propose GCG++, an improvement to the GCG attack that reaches 94% ASR on white-box Llama-2-7B, and the Random-Search Attack on LLMs (RAL), a strong but simple baseline for query-based attacks. We believe the techniques proposed in this work will enable more comprehensive safety testing of LLMs and, in the long term, the development of better security guardrails. The code can be found at https://github.com/chawins/pal. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.05339 [pdf]

Can participation in a hackathon impact the motivation of software engineering students? A preliminary case study analysis

Authors: Allysson Allex Araújo, Marcos Kalinowski, Maria Teresa Baldassarre

Abstract: [Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis usin… ▽ More [Background] Hackathons are increasingly gaining prominence in Software Engineering (SE) education, lauded for their ability to elevate students' skill sets. [Objective] This paper investigates whether hackathons can impact the motivation of SE students. [Method] We conducted an evaluative case study assessing students' motivations before and after a hackathon, combining quantitative analysis using the Academic Motivation Scale (AMS) and qualitative coding of open-ended responses. [Results] Pre-hackathon findings reveal a diverse range of motivations with an overall acceptance, while post-hackathon responses highlight no statistically significant shift in participants' perceptions. Qualitative findings uncovered themes related to networking, team dynamics, and skill development. From a practical perspective, our findings highlight the potential of hackathons to impact participants' motivation. [Conclusion] While our study enhances the comprehension of hackathons as a motivational tool, it also underscores the need for further exploration of psychometric dimensions in SE educational research. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03337 [pdf, other]

Reinforcement-learning robotic sailboats: simulator and preliminary results

Authors: Eduardo Charles Vasconcellos, Ronald M Sampaio, André P D Araújo, Esteban Walter Gonzales Clua, Philippe Preux, Raphael Guerra, Luiz M G Gonçalves, Luis Martí, Hernan Lira, Nayat Sanchez-Pi

Abstract: This work focuses on the main challenges and problems in develo** a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the si… ▽ More This work focuses on the main challenges and problems in develo** a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for develo** navigation algorithms based on RL to be applied on real boats. △ Less

Submitted 16 January, 2024; originally announced February 2024.

Journal ref: NeurIPS 2023 Workshop on Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models, Dec 2023, New Orelans, United States

arXiv:2401.14033 [pdf, ps, other]

Novel Quadratic Constraints for Extending LipSDP beyond Slope-Restricted Activations

Authors: Patricia Pauli, Aaron Havens, Alexandre Araujo, Siddharth Garg, Farshad Khorrami, Frank Allgöwer, Bin Hu

Abstract: Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation r… ▽ More Recently, semidefinite programming (SDP) techniques have shown great promise in providing accurate Lipschitz bounds for neural networks. Specifically, the LipSDP approach (Fazlyab et al., 2019) has received much attention and provides the least conservative Lipschitz upper bounds that can be computed with polynomial time guarantees. However, one main restriction of LipSDP is that its formulation requires the activation functions to be slope-restricted on $[0,1]$, preventing its further use for more general activation functions such as GroupSort, MaxMin, and Householder. One can rewrite MaxMin activations for example as residual ReLU networks. However, a direct application of LipSDP to the resultant residual ReLU networks is conservative and even fails in recovering the well-known fact that the MaxMin activation is 1-Lipschitz. Our paper bridges this gap and extends LipSDP beyond slope-restricted activation functions. To this end, we provide novel quadratic constraints for GroupSort, MaxMin, and Householder activations via leveraging their underlying properties such as sum preservation. Our proposed analysis is general and provides a unified approach for estimating $\ell_2$ and $\ell_\infty$ Lipschitz bounds for a rich class of neural network architectures, including non-residual and residual neural networks and implicit models, with GroupSort, MaxMin, and Householder activations. Finally, we illustrate the utility of our approach with a variety of experiments and show that our proposed SDPs generate less conservative Lipschitz bounds in comparison to existing approaches. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: accepted as a conference paper at ICLR 2024

arXiv:2311.17846 [pdf, other]

Towards Real-World Focus Stacking with Deep Learning

Authors: Alexandre Araujo, Jean Ponce, Julien Mairal

Abstract: Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short i… ▽ More Focus stacking is widely used in micro, macro, and landscape photography to reconstruct all-in-focus images from multiple frames obtained with focus bracketing, that is, with shallow depth of field and different focus planes. Existing deep learning approaches to the underlying multi-focus image fusion problem have limited applicability to real-world imagery since they are designed for very short image sequences (two to four images), and are typically trained on small, low-resolution datasets either acquired by light-field cameras or generated synthetically. We introduce a new dataset consisting of 94 high-resolution bursts of raw images with focus bracketing, with pseudo ground truth computed from the data using state-of-the-art commercial software. This dataset is used to train the first deep learning algorithm for focus stacking capable of handling bursts of sufficient length for real-world applications. Qualitative experiments demonstrate that it is on par with existing commercial solutions in the long-burst, realistic regime while being significantly more tolerant to noise. The code and dataset are available at https://github.com/araujoalexandre/FocusStackingDataset. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.05452 [pdf, other]

Transformer-based Model for Oral Epithelial Dysplasia Segmentation

Authors: Adam J Shephard, Hanya Mahmood, Shan E Ahmed Raza, Anna Luiza Damaceno Araujo, Alan Roger Santos-Silva, Marcio Ajudarte Lopes, Pablo Agustin Vargas, Kris McCombe, Stephanie Craig, Jacqueline James, Jill Brooks, Paul Nankivell, Hisham Mehanna, Syed Ali Khurram, Nasir M Rajpoot

Abstract: Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was… ▽ More Oral epithelial dysplasia (OED) is a premalignant histopathological diagnosis given to lesions of the oral cavity. OED grading is subject to large inter/intra-rater variability, resulting in the under/over-treatment of patients. We developed a new Transformer-based pipeline to improve detection and segmentation of OED in haematoxylin and eosin (H&E) stained whole slide images (WSIs). Our model was trained on OED cases (n = 260) and controls (n = 105) collected using three different scanners, and validated on test data from three external centres in the United Kingdom and Brazil (n = 78). Our internal experiments yield a mean F1-score of 0.81 for OED segmentation, which reduced slightly to 0.71 on external testing, showing good generalisability, and gaining state-of-the-art results. This is the first externally validated study to use Transformers for segmentation in precancerous histology images. Our publicly available model shows great promise to be the first step of a fully-integrated pipeline, allowing earlier and more efficient OED diagnosis, ultimately benefiting patient outcomes. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 5 pages, 2 figures, 4 tables

arXiv:2310.18274 [pdf, other]

LipSim: A Provably Robust Perceptual Similarity Metric

Authors: Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Farshad Khorrami, Siddharth Garg

Abstract: Recent years have seen growing interest in develo** and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the esta… ▽ More Recent years have seen growing interest in develo** and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the established vulnerability of neural networks to adversarial attacks. It is indeed logical to infer that perceptual metrics may inherit both the strengths and shortcomings of neural networks. In this work, we demonstrate the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim (Lipschitz Similarity Metric) with provable guarantees. By leveraging 1-Lipschitz neural networks as the backbone, LipSim provides guarded areas around each data point and certificates for all perturbations within an $\ell_2$ ball. Finally, a comprehensive set of experiments shows the performance of LipSim in terms of natural and certified scores and on the image retrieval application. The code is available at https://github.com/SaraGhazanfari/LipSim. △ Less

Submitted 29 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.03664 [pdf, other]

Certification of Deep Learning Models for Medical Image Segmentation

Authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Nikos Paragios, Marie-Pierre Revel, Maria Vakalopoulou

Abstract: In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provi… ▽ More In medical imaging, segmentation models have known a significant improvement in the past decade and are now used daily in clinical practice. However, similar to classification models, segmentation models are affected by adversarial attacks. In a safety-critical field like healthcare, certifying model predictions is of the utmost importance. Randomized smoothing has been introduced lately and provides a framework to certify models and obtain theoretical guarantees. In this paper, we present for the first time a certified segmentation baseline for medical imaging based on randomized smoothing and diffusion models. Our results show that leveraging the power of denoising diffusion probabilistic models helps us overcome the limits of randomized smoothing. We conduct extensive experiments on five public datasets of chest X-rays, skin lesions, and colonoscopies, and empirically show that we are able to maintain high certified Dice scores even for highly perturbed images. Our work represents the first attempt to certify medical image segmentation models, and we aspire for it to set a foundation for future benchmarks in this crucial and largely uncharted area. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.16883 [pdf, other]

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing

Authors: Blaise Delattre, Alexandre Araujo, Quentin Barthélemy, Alexandre Allauzen

Abstract: Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection in… ▽ More Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius in this context is a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. To increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models with randomized smoothing, effectively improving the current certification radius in a zero-shot manner. △ Less

Submitted 18 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.08250 [pdf, other]

Optimization of Rank Losses for Image Retrieval

Authors: Elias Ramzi, Nicolas Audebert, Clément Rambour, André Araujo, Xavier Bitot, Nicolas Thome

Abstract: In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomp… ▽ More In image retrieval, standard evaluation metrics rely on score ranking, \eg average precision (AP), recall at k (R@k), normalized discounted cumulative gain (NDCG). In this work we introduce a general framework for robust and decomposable rank losses optimization. It addresses two major challenges for end-to-end training of deep neural networks with rank losses: non-differentiability and non-decomposability. Firstly we propose a general surrogate for ranking operator, SupRank, that is amenable to stochastic gradient descent. It provides an upperbound for rank losses and ensures robust training. Secondly, we use a simple yet effective loss function to reduce the decomposability gap between the averaged batch approximation of ranking losses and their values on the whole training set. We apply our framework to two standard metrics for image retrieval: AP and R@k. Additionally we apply our framework to hierarchical image retrieval. We introduce an extension of AP, the hierarchical average precision $\mathcal{H}$-AP, and optimize it as well as the NDCG. Finally we create the first hierarchical landmarks retrieval dataset. We use a semi-automatic pipeline to create hierarchical labels, extending the large scale Google Landmarks v2 dataset. The hierarchical dataset is publicly available at https://github.com/cvdfoundation/google-landmark. Code will be released at https://github.com/elias-ramzi/SupRank. △ Less

Submitted 15 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2207.04873

arXiv:2309.01858 [pdf, other]

Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

Authors: Nikolaos-Antonios Ypsilantis, Kaifeng Chen, Bingyi Cao, Mário Lipovský, Pelin Dogan-Schönberger, Grzegorz Makosa, Boris Bluntschli, Mojtaba Seyedhosseini, Ondřej Chum, André Araujo

Abstract: Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific d… ▽ More Fine-grained and instance-level recognition methods are commonly trained and evaluated on specific domains, in a model per domain scenario. Such an approach, however, is impractical in real large-scale applications. In this work, we address the problem of universal image embedding, where a single universal model is trained and used in multiple domains. First, we leverage existing domain-specific datasets to carefully construct a new large-scale public benchmark for the evaluation of universal image embeddings, with 241k query images, 1.4M index images and 2.8M training images across 8 different domains and 349k classes. We define suitable metrics, training and evaluation protocols to foster future research in this area. Second, we provide a comprehensive experimental evaluation on the new dataset, demonstrating that existing approaches and simplistic extensions lead to worse performance than an assembly of models trained for each domain separately. Finally, we conducted a public research competition on this topic, leveraging industrial datasets, which attracted the participation of more than 1k teams worldwide. This exercise generated many interesting research ideas and findings which we present in detail. Project webpage: https://cmp.felk.cvut.cz/univ_emb/ △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Accepted

arXiv:2308.13363 [pdf, other]

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Authors: Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

Abstract: Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial intera… ▽ More Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L, reaches 83.2% top-1 accuracy on ImageNet-1k with 13.7 GFLOPs and 94 M parameters. △ Less

Submitted 14 January, 2024; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, developed under Penn State University's Multi-Campus Research Experience for Undergraduates Symposium, 2023. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2308.06954 [pdf, other]

Global Features are All You Need for Image Retrieval and Reranking

Authors: Shihao Shao, Kaifeng Chen, Arjun Karpur, Qinghua Cui, Andre Araujo, Bingyi Cao

Abstract: Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs glo… ▽ More Image retrieval systems conventionally use a two-stage paradigm, leveraging global features for initial retrieval and local features for reranking. However, the scalability of this method is often limited due to the significant storage and computation cost incurred by local feature matching in the reranking stage. In this paper, we present SuperGlobal, a novel approach that exclusively employs global features for both stages, improving efficiency without sacrificing accuracy. SuperGlobal introduces key enhancements to the retrieval system, specifically focusing on the global feature extraction and reranking processes. For extraction, we identify sub-optimal performance when the widely-used ArcFace loss and Generalized Mean (GeM) pooling methods are combined and propose several new modules to improve GeM pooling. In the reranking stage, we introduce a novel method to update the global features of the query and top-ranked images by only considering feature refinement with a small set of images, thus being very compute and memory efficient. Our experiments demonstrate substantial improvements compared to the state of the art in standard benchmarks. Notably, on the Revisited Oxford+1M Hard dataset, our single-stage results improve by 7.1%, while our two-stage gain reaches 3.7% with a strong 64,865x speedup. Our two-stage system surpasses the current single-stage state-of-the-art by 16.3%, offering a scalable, accurate alternative for high-performing image retrieval systems with minimal time overhead. Code: https://github.com/ShihaoShao-GH/SuperGlobal. △ Less

Submitted 19 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2308.05810 [pdf, other]

Spintronics for image recognition: performance benchmarking via ultrafast data-driven simulations

Authors: Anatole Moureaux, Chloé Chopin, Simon de Wergifosse, Laurent Jacques, Flavio Abreu Araujo

Abstract: We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges asso… ▽ More We present a demonstration of image classification using an echo-state network (ESN) relying on a single simulated spintronic nanostructure known as the vortex-based spin-torque oscillator (STVO) delayed in time. We employ an ultrafast data-driven simulation framework called the data-driven Thiele equation approach (DD-TEA) to simulate the STVO dynamics. This allows us to avoid the challenges associated with repeated experimental manipulation of such a nanostructured system. We showcase the versatility of our solution by successfully applying it to solve classification challenges with the MNIST, EMNIST-letters and Fashion MNIST datasets. Through our simulations, we determine that within an ESN with numerous learnable parameters the results obtained using the STVO dynamics as an activation function are comparable to the ones obtained with other conventional nonlinear activation functions like the reLU and the sigmoid. While achieving state-of-the-art accuracy levels on the MNIST dataset, our model's performance on EMNIST-letters and Fashion MNIST is lower due to the relative simplicity of the system architecture and the increased complexity of the tasks. We expect that the DD-TEA framework will enable the exploration of deeper architectures, ultimately leading to improved classification accuracy. △ Less

Submitted 7 February, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures

arXiv:2307.15157 [pdf, other]

R-LPIPS: An Adversarially Robust Perceptual Similarity Metric

Authors: Sara Ghazanfari, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Alexandre Araujo

Abstract: Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception whe… ▽ More Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at https://github.com/SaraGhazanfari/R-LPIPS. △ Less

Submitted 31 July, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.09262 [pdf, other]

Neuromorphic spintronics simulated using an unconventional data-driven Thiele equation approach

Authors: Anatole Moureaux, Simon de Wergifosse, Chloé Chopin, Flavio Abreu Araujo

Abstract: In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reac… ▽ More In this study, we developed a quantitative description of the dynamics of spin-torque vortex nano-oscillators (STVOs) through an unconventional model based on the combination of the Thiele equation approach (TEA) and data from micromagnetic simulations (MMS). Solving the STVO dynamics with our analytical model allows to accelerate the simulations by 9 orders of magnitude compared to MMS while reaching the same level of accuracy. Here, we showcase our model by simulating a STVO-based neural network for solving a classification task. We assess its performance with respect to the input signal current intensity and the level of noise that might affect such a system. Our approach is promising for accelerating the design of STVO-based neuromorphic computing devices while decreasing drastically its computational cost. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Presented in ISCS2023

Report number: ISCS23-46

arXiv:2306.09949 [pdf, other]

Towards Better Certified Segmentation via Diffusion Models

Authors: Othmane Laousy, Alexandre Araujo, Guillaume Chassagnon, Marie-Pierre Revel, Siddharth Garg, Farshad Khorrami, Maria Vakalopoulou

Abstract: The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been prop… ▽ More The robustness of image segmentation has been an important research topic in the past few years as segmentation models have reached production-level accuracy. However, like classification models, segmentation models can be vulnerable to adversarial perturbations, which hinders their use in critical-decision systems like healthcare or autonomous driving. Recently, randomized smoothing has been proposed to certify segmentation predictions by adding Gaussian noise to the input to obtain theoretical guarantees. However, this method exhibits a trade-off between the amount of added noise and the level of certification achieved. In this paper, we address the problem of certifying segmentation prediction using a combination of randomized smoothing and diffusion models. Our experiments show that combining randomized smoothing and diffusion models significantly improves certified robustness, with results indicating a mean improvement of 21 points in accuracy compared to previous state-of-the-art methods on Pascal-Context and Cityscapes public datasets. Our method is independent of the selected segmentation model and does not need any additional specialized training procedure. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.09224 [pdf, other]

Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories

Authors: Thomas Mensink, Jasper Uijlings, Lluis Castrejon, Arushi Goel, Felipe Cadar, Howard Zhou, Fei Sha, André Araujo, Vittorio Ferrari

Abstract: We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evi… ▽ More We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to) 5 images, resulting in a total of 1M VQA samples. Moreover, our dataset comes with a controlled knowledge base derived from Wikipedia, marking the evidence to support each answer. Empirically, we show that our dataset poses a hard challenge for large vision+language models as they perform poorly on our dataset: PaLI [14] is state-of-the-art on OK-VQA [37], yet it only achieves 13.0% accuracy on our dataset. Moreover, we experimentally show that progress on answering our encyclopedic questions can be achieved by augmenting large models with a mechanism that retrieves relevant information from the knowledge base. An oracle experiment with perfect retrieval achieves 87.0% accuracy on the single-hop portion of our dataset, and an automatic retrieval-augmented prototype yields 48.8%. We believe that our dataset enables future research on retrieval-augmented vision+language models. It is available at https://github.com/google-research/google-research/tree/master/encyclopedic_vqa . △ Less

Submitted 24 July, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV'23

arXiv:2306.09109 [pdf, other]

NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and Pose Annotations

Authors: Varun Jampani, Kevis-Kokitsi Maninis, Andreas Engelhardt, Arjun Karpur, Karen Truong, Kyle Sargent, Stefan Popov, André Araujo, Ricardo Martin-Brualla, Kaushal Patel, Daniel Vlasic, Vittorio Ferrari, Ameesh Makadia, Ce Liu, Yuanzhen Li, Howard Zhou

Abstract: Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search… ▽ More Recent advances in neural reconstruction enable high-quality 3D object reconstruction from casually captured image collections. Current techniques mostly analyze their progress on relatively simple image collections where Structure-from-Motion (SfM) techniques can provide ground-truth (GT) camera poses. We note that SfM techniques tend to fail on in-the-wild image collections such as image search results with varying backgrounds and illuminations. To enable systematic research progress on 3D reconstruction from casual image captures, we propose NAVI: a new dataset of category-agnostic image collections of objects with high-quality 3D scans along with per-image 2D-3D alignments providing near-perfect GT camera parameters. These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps. We demonstrate the use of NAVI image collections on different problem settings and show that NAVI enables more thorough evaluations that were not possible with existing datasets. We believe NAVI is beneficial for systematic research progress on 3D reconstruction and correspondence estimation. Project page: https://navidataset.github.io △ Less

Submitted 13 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023 camera ready. Project page: https://navidataset.github.io

arXiv:2306.09012 [pdf, other]

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

Authors: Dror Aiger, André Araujo, Simon Lynen

Abstract: Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localiza… ▽ More Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann} △ Less

Submitted 29 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2305.16494 [pdf, other]

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Authors: Haotian Xue, Alexandre Araujo, Bin Hu, Yongxin Chen

Abstract: Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this pa… ▽ More Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD △ Less

Submitted 17 January, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted as a conference paper in NeurIPS'2023. Code repo: https://github.com/xavihart/Diff-PGD

arXiv:2305.16173 [pdf, other]

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration

Authors: Blaise Delattre, Quentin Barthélemy, Alexandre Araujo, Alexandre Allauzen

Abstract: Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power ite… ▽ More Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches. Code is available at https://github.com/blaisedelattre/lip4conv. △ Less

Submitted 19 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: ICML 2023

arXiv:2304.00583 [pdf, other]

Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints

Authors: Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento

Abstract: Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for… ▽ More Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval. The core assumption of most methods is that images undergo affine transformations, disregarding more complicated effects such as non-rigid deformations. Furthermore, incipient works tailored for non-rigid correspondence still rely on keypoint detectors designed for rigid transformations, hindering performance due to the limitations of the detector. We propose DALF (Deformation-Aware Local Features), a novel deformation-aware network for jointly detecting and describing keypoints, to handle the challenging problem of matching deformable surfaces. All network components work cooperatively through a feature fusion approach that enforces the descriptors' distinctiveness and invariance. Experiments using real deforming objects showcase the superiority of our method, where it delivers 8% improvement in matching scores compared to the previous best results. Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration. Code for training, inference, and applications are publicly available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: CVPR 2023; Source code available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23

arXiv:2303.12779 [pdf, other]

LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D Signals

Authors: Arjun Karpur, Guilherme Perrotta, Ricardo Martin-Brualla, Howard Zhou, André Araujo

Abstract: Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera base… ▽ More Finding localized correspondences across different images of the same object is crucial to understand its geometry. In recent years, this problem has seen remarkable progress with the advent of deep learning-based local image features and learnable matchers. Still, learnable matchers often underperform when there exists only small regions of co-visibility between image pairs (i.e. wide camera baselines). To address this problem, we leverage recent progress in coarse single-view geometry estimation methods. We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks and enhances their capabilities by integrating noisy, estimated 3D signals to boost correspondence estimation. When integrating 3D signals into the matcher model, we show that a suitable positional encoding is critical to effectively make use of the low-dimensional 3D information. We experiment with two different 3D signals - normalized object coordinates and monocular depth estimates - and evaluate our method on large-scale (synthetic and real) datasets containing object-centric image pairs across wide baselines. We observe strong feature matching improvements compared to 2D-only methods, with up to +6% total recall and +28% precision at fixed recall. Additionally, we demonstrate that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs - up to 8.6% compared to the 2D-only approach. △ Less

Submitted 30 January, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

Comments: 3DV 2024, oral paper

arXiv:2303.03169 [pdf, ps, other]

A Unified Algebraic Perspective on Lipschitz Neural Networks

Authors: Alexandre Araujo, Aaron Havens, Blaise Delattre, Alexandre Allauzen, Bin Hu

Abstract: Important research efforts have focused on the design and training of neural networks with a controlled Lipschitz constant. The goal is to increase and sometimes guarantee the robustness against adversarial attacks. Recent promising techniques draw inspirations from different backgrounds to design 1-Lipschitz neural networks, just to name a few: convex potential layers derive from the discretizati… ▽ More Important research efforts have focused on the design and training of neural networks with a controlled Lipschitz constant. The goal is to increase and sometimes guarantee the robustness against adversarial attacks. Recent promising techniques draw inspirations from different backgrounds to design 1-Lipschitz neural networks, just to name a few: convex potential layers derive from the discretization of continuous dynamical systems, Almost-Orthogonal-Layer proposes a tailored method for matrix rescaling. However, it is today important to consider the recent and promising contributions in the field under a common theoretical lens to better design new and improved layers. This paper introduces a novel algebraic perspective unifying various types of 1-Lipschitz neural networks, including the ones previously mentioned, along with methods based on orthogonality and spectral methods. Interestingly, we show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. We also prove that AOL biases the scaled weight to the ones which are close to the set of orthogonal matrices in a certain mathematical manner. Moreover, our algebraic condition, combined with the Gershgorin circle theorem, readily leads to new and diverse parameterizations for 1-Lipschitz network layers. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers. Finally, the comprehensive set of experiments on image classification shows that SLLs outperform previous approaches on certified robust accuracy. Code is available at https://github.com/araujoalexandre/Lipschitz-SLL-Networks. △ Less

Submitted 26 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: ICLR 2023. Spotlight paper

arXiv:2301.11025 [pdf, other]

Neuromorphic spintronics accelerated by an unconventional data-driven Thiele equation approach

Authors: Anatole Moureaux, Simon De Wergifosse, Chloé Chopin, Jimmy Weber, Flavio Abreu Araujo

Abstract: We design a neural network based on a single spin-torque vortex nano-oscillator (STVO) multiplexed in time. The behavior of the STVO is simulated with an improved ultra-fast and quantitative model based on the Thiele equation approach. Different mathematical and numerical adaptations are brought to the model in order to increase the accuracy and the speed of the simulations. We demonstrate the hig… ▽ More We design a neural network based on a single spin-torque vortex nano-oscillator (STVO) multiplexed in time. The behavior of the STVO is simulated with an improved ultra-fast and quantitative model based on the Thiele equation approach. Different mathematical and numerical adaptations are brought to the model in order to increase the accuracy and the speed of the simulations. We demonstrate the high added value and adaptability of such a neural network through the resolution of three standard machine learning tasks in the framework of reservoir computing. The first one is a task of waveform (sines and squares) classification. We show the ability of the system to effectively classify waveforms with high accuracy and low root-mean-square error thanks to the intrinsic short-term memory of the device. Given the high throughput of the simulations, two innovative parametric studies on the intensity of the input signal and the level of noise in the system are performed to demonstrate the value of our new models. The efficiency of our system is then tested during a speech recognition task on the TI-46 dataset and shows the agreement between the new models and the corresponding experimental measurements. Finally, we use our STVO-based neural network to perform image recognition on the MNIST dataset. State-of-the-art performances are demonstrated, and the interest of using the STVO dynamics as an activation function is highlighted. These results support and facilitate the future development of neuromorphic STVO-based hardware for energy-efficient machine learning. △ Less

Submitted 18 April, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

Comments: 11 pages, 8 figures

arXiv:2210.03042 [pdf, other]

doi 10.1109/SBGames54170.2021.00023

Perception of Personality Traits in Crowds of Virtual Humans

Authors: Lucas Nardino, Enzo Krzmienszki, Vinícius Jurinic Cassol, Diogo Schaffer, Victor Flávio de Andrade Araujo, Rodolfo Migon Favaretto, Felipe Elsner, Gabriel Fonseca Silva, Soraia Raupp Musse

Abstract: This paper proposes a perceptual visual analysis regarding the personality of virtual humans. Many studies have presented findings regarding the way human beings perceive virtual humans with respect to their faces, body animation, motion in the virtual environment and etc. We are interested in investigating the way people perceive visual manifestations of virtual humans' personality traits when th… ▽ More This paper proposes a perceptual visual analysis regarding the personality of virtual humans. Many studies have presented findings regarding the way human beings perceive virtual humans with respect to their faces, body animation, motion in the virtual environment and etc. We are interested in investigating the way people perceive visual manifestations of virtual humans' personality traits when they are interactive and organized in groups. Many applications in games and movies can benefit from the findings regarding the perceptual analysis with the main goal to provide more realistic characters and improve the users' experience. We provide experiments with subjects and obtained results indicate that, although is very subtle, people perceive more the extraversion (the personality trait that we measured), into the crowds of virtual humans, when interacting with virtual humans behaviors, than when just observing as a spectator camera. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 9 pages, 7 figures, 1 table

arXiv:2206.07091 [pdf, other]

doi 10.1016/j.comnet.2022.109189

Monitoring Fog Computing: a Review, Taxonomy and Open Challenges

Authors: Breno Costa, Joao Bachiega Jr, Leonardo Reboucas de Carvalho, Michel Rosa, Aleteia Araujo

Abstract: Fog computing is a distributed paradigm that provides computational resources in the users' vicinity. Fog orchestration is a set of functionalities that coordinate the dynamic infrastructure and manage the services to guarantee the Service Level Agreements. Monitoring is an orchestration functionality of prime importance. It is the basis for resource management actions, collecting status of resour… ▽ More Fog computing is a distributed paradigm that provides computational resources in the users' vicinity. Fog orchestration is a set of functionalities that coordinate the dynamic infrastructure and manage the services to guarantee the Service Level Agreements. Monitoring is an orchestration functionality of prime importance. It is the basis for resource management actions, collecting status of resource and service and delivering updated data to the orchestrator. There are several cloud monitoring solutions and tools, but none of them comply with fog characteristics and challenges. Fog monitoring solutions are scarce, and they may not be prepared to compose an orchestration service. This paper updates the knowledge base about fog monitoring, assessing recent subjects in this context like observability, data standardization and instrumentation domains. We propose a novel taxonomy of fog monitoring solutions, supported by a systematic review of the literature. Fog monitoring proposals are analyzed and categorized by this new taxonomy, offering researchers a comprehensive overview. This work also highlights the main challenges and open research questions. △ Less

Submitted 16 June, 2022; v1 submitted 13 May, 2022; originally announced June 2022.

arXiv:2206.01715 [pdf, other]

Towards Evading the Limits of Randomized Smoothing: A Theoretical Analysis

Authors: Raphael Ettedgui, Alexandre Araujo, Rafael Pinot, Yann Chevaleyre, Jamal Atif

Abstract: Randomized smoothing is the dominant standard for provable defenses against adversarial examples. Nevertheless, this method has recently been proven to suffer from important information theoretic limitations. In this paper, we argue that these limitations are not intrinsic, but merely a byproduct of current certification methods. We first show that these certificates use too little information abo… ▽ More Randomized smoothing is the dominant standard for provable defenses against adversarial examples. Nevertheless, this method has recently been proven to suffer from important information theoretic limitations. In this paper, we argue that these limitations are not intrinsic, but merely a byproduct of current certification methods. We first show that these certificates use too little information about the classifier, and are in particular blind to the local curvature of the decision boundary. This leads to severely sub-optimal robustness guarantees as the dimension of the problem increases. We then show that it is theoretically possible to bypass this issue by collecting more information about the classifier. More precisely, we show that it is possible to approximate the optimal certificate with arbitrary precision, by probing the decision boundary with several noise distributions. Since this process is executed at certification time rather than at test time, it entails no loss in natural accuracy while enhancing the quality of the certificates. This result fosters further research on classifier-specific certification and demonstrates that randomized smoothing is still worth investigating. Although classifier-specific certification may induce more computational cost, we also provide some theoretical insight on how to mitigate it. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2206.01326 [pdf, other]

Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information

Authors: Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand

Abstract: There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML f… ▽ More There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML fairness efforts should extend to object recognition as well. Buildings, artwork, food and clothing are examples of the objects that define human culture. Representing these objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture and more inclusive of different traditions and values. There exist many research datasets for object recognition, but they have not carefully considered which classes should be included, or how much training data should be collected per class. To address this, we propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors: we define fair relevance scores, estimate them, and assign them to each class. We showcase its application to the landmark recognition domain, presenting a detailed analysis and the final fairer landmark rankings. We present analysis which leads to a much fairer coverage of the world compared to existing datasets. The evaluation dataset was used for the 2021 Google Landmark Challenges, which was the first of a kind with an emphasis on fairness in generic object recognition. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2203.07425 [pdf, other]

Computational Perspective of the Fog Node

Authors: João Bachiega Jr, Breno Costa, Aleteia P. F. Araujo

Abstract: Fog computing is a recent computational paradigm that was proposed to solve some weaknesses in cloud-based systems. For this reason, this technology has been extensively studied by several technology areas. It is still in a maturing stage, so there is no consensus in academia about its concepts and definitions, and each area adopts the ones that are convenient for each use case. This article propo… ▽ More Fog computing is a recent computational paradigm that was proposed to solve some weaknesses in cloud-based systems. For this reason, this technology has been extensively studied by several technology areas. It is still in a maturing stage, so there is no consensus in academia about its concepts and definitions, and each area adopts the ones that are convenient for each use case. This article proposes a definition and a classification relying on a computational perspective for the fog node, which is a fundamental element in a fog computing environment. In addition, the main challenges related to the fog node are also presented. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: Paper presented in the conference '2021 World Congress in Computer Science, Computer Engineering, & Applied Computing' (CSCE'21)

arXiv:2110.12690 [pdf, other]

A Dynamical System Perspective for Lipschitz Neural Networks

Authors: Laurent Meunier, Blaise Delattre, Alexandre Araujo, Alexandre Allauzen

Abstract: The Lipschitz constant of neural networks has been established as a key quantity to enforce the robustness to adversarial examples. In this paper, we tackle the problem of building $1$-Lipschitz Neural Networks. By studying Residual Networks from a continuous time dynamical system perspective, we provide a generic method to build $1$-Lipschitz Neural Networks and show that some previous approaches… ▽ More The Lipschitz constant of neural networks has been established as a key quantity to enforce the robustness to adversarial examples. In this paper, we tackle the problem of building $1$-Lipschitz Neural Networks. By studying Residual Networks from a continuous time dynamical system perspective, we provide a generic method to build $1$-Lipschitz Neural Networks and show that some previous approaches are special cases of this framework. Then, we extend this reasoning and show that ResNet flows derived from convex potentials define $1$-Lipschitz transformations, that lead us to define the {\em Convex Potential Layer} (CPL). A comprehensive set of experiments on several datasets demonstrates the scalability of our architecture and the benefits as an $\ell_2$-provable defense against adversarial examples. △ Less

Submitted 1 February, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

arXiv:2109.00959 [pdf, other]

Building Compact and Robust Deep Neural Networks with Toeplitz Matrices

Authors: Alexandre Araujo

Abstract: Deep neural networks are state-of-the-art in a wide variety of tasks, however, they exhibit important limitations which hinder their use and deployment in real-world applications. When develo** and training neural networks, the accuracy should not be the only concern, neural networks must also be cost-effective and reliable. Although accurate, large neural networks often lack these properties. T… ▽ More Deep neural networks are state-of-the-art in a wide variety of tasks, however, they exhibit important limitations which hinder their use and deployment in real-world applications. When develo** and training neural networks, the accuracy should not be the only concern, neural networks must also be cost-effective and reliable. Although accurate, large neural networks often lack these properties. This thesis focuses on the problem of training neural networks which are not only accurate but also compact, easy to train, reliable and robust to adversarial examples. To tackle these problems, we leverage the properties of structured matrices from the Toeplitz family to build compact and secure neural networks. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: Thesis

arXiv:2108.08874 [pdf, other]

Towards A Fairer Landmark Recognition Dataset

Authors: Zu Kim, André Araujo, Bingyi Cao, Cam Askew, Jack Sim, Mike Green, N'Mah Fodiatu Yilla, Tobias Weyand

Abstract: We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. T… ▽ More We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. These relevances are estimated by combining anonymized Google Maps user contribution statistics with the contributors' demographic information. We present a stratification approach and analysis which leads to a much fairer coverage of the world, compared to existing datasets. The resulting datasets are used to evaluate computer vision models as part of the the Google Landmark Recognition and RetrievalChallenges 2021. △ Less

Submitted 6 June, 2022; v1 submitted 19 August, 2021; originally announced August 2021.

Comments: Please cite the full detailed version of the paper instead: Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information arXiv:2206.01326

arXiv:2108.02628 [pdf, ps, other]

A New State-of-the-Art Transformers-Based Load Forecaster on the Smart Grid Domain

Authors: Andre Luiz Farias Novaes, Rui Alexandre de Matos Araujo, Jose Figueiredo, Lucas Aguiar Pavanelli

Abstract: Meter-level load forecasting is crucial for efficient energy management and power system planning for Smart Grids (SGs), in tasks associated with regulation, dispatching, scheduling, and unit commitment of power grids. Although a variety of algorithms have been proposed and applied on the field, more accurate and robust models are still required: the overall utility cost of operations in SGs incre… ▽ More Meter-level load forecasting is crucial for efficient energy management and power system planning for Smart Grids (SGs), in tasks associated with regulation, dispatching, scheduling, and unit commitment of power grids. Although a variety of algorithms have been proposed and applied on the field, more accurate and robust models are still required: the overall utility cost of operations in SGs increases 10 million currency units if the load forecasting error increases 1%, and the mean absolute percentage error (MAPE) in forecasting is still much higher than 1%. Transformers have become the new state-of-the-art in a variety of tasks, including the ones in computer vision, natural language processing and time series forecasting, surpassing alternative neural models such as convolutional and recurrent neural networks. In this letter, we present a new state-of-the-art Transformer-based algorithm for the meter-level load forecasting task, which has surpassed the former state-of-the-art, LSTM, and the traditional benchmark, vanilla RNN, in all experiments by a margin of at least 13% in MAPE. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2108.02318 [pdf, other]

doi 10.1038/s41467-022-28571-7

Forecasting the outcome of spintronic experiments with Neural Ordinary Differential Equations

Authors: Xing Chen, Flavio Abreu Araujo, Mathieu Riou, Jacob Torrejon, Dafiné Ravelosona, Wang Kang, Weisheng Zhao, Julie Grollier, Damien Querlioz

Abstract: Deep learning has an increasing impact to assist research, allowing, for example, the discovery of novel materials. Until now, however, these artificial intelligence techniques have fallen short of discovering the full differential equation of an experimental physical system. Here we show that a dynamical neural network, trained on a minimal amount of data, can predict the behavior of spintronic d… ▽ More Deep learning has an increasing impact to assist research, allowing, for example, the discovery of novel materials. Until now, however, these artificial intelligence techniques have fallen short of discovering the full differential equation of an experimental physical system. Here we show that a dynamical neural network, trained on a minimal amount of data, can predict the behavior of spintronic devices with high accuracy and an extremely efficient simulation time, compared to the micromagnetic simulations that are usually employed to model them. For this purpose, we re-frame the formalism of Neural Ordinary Differential Equations (ODEs) to the constraints of spintronics: few measured outputs, multiple inputs and internal parameters. We demonstrate with Spin-Neural ODEs an acceleration factor over 200 compared to micromagnetic simulations for a complex problem -- the simulation of a reservoir computer made of magnetic skyrmions (20 minutes compared to three days). In a second realization, we show that we can predict the noisy response of experimental spintronic nano-oscillators to varying inputs after training Spin-Neural ODEs on five milliseconds of their measured response to different excitations. Spin-Neural ODE is a disruptive tool for develo** spintronic applications in complement to micromagnetic simulations, which are time-consuming and cannot fit experiments when noise or imperfections are present. Spin-Neural ODE can also be generalized to other electronic devices involving dynamics. △ Less

Submitted 23 July, 2021; originally announced August 2021.

Comments: 16 pages, 4 figures

arXiv:2104.05279 [pdf, other]

Class-Balanced Distillation for Long-Tailed Visual Recognition

Authors: Ahmet Iscen, André Araujo, Boqing Gong, Cordelia Schmid

Abstract: Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observa… ▽ More Real-world imagery is often characterized by a significant imbalance of the number of images per class, leading to long-tailed distributions. An effective and simple approach to long-tailed visual recognition is to learn feature representations and a classifier separately, with instance and class-balanced sampling, respectively. In this work, we introduce a new framework, by making the key observation that a feature representation learned with instance sampling is far from optimal in a long-tailed setting. Our main contribution is a new training method, referred to as Class-Balanced Distillation (CBD), that leverages knowledge distillation to enhance feature representations. CBD allows the feature representation to evolve in the second training stage, guided by the teacher learned in the first stage. The second stage uses class-balanced sampling, in order to focus on under-represented classes. This framework can naturally accommodate the usage of multiple teachers, unlocking the information from an ensemble of models to enhance recognition capabilities. Our experiments show that the proposed technique consistently outperforms the state of the art on long-tailed recognition benchmarks such as ImageNet-LT, iNaturalist17 and iNaturalist18. △ Less

Submitted 12 January, 2022; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: The code is available at https://github.com/google-research/google-research/tree/master/class_balanced_distillation

arXiv:2012.02632 [pdf, other]

Advocating for Multiple Defense Strategies against Adversarial Examples

Authors: Alexandre Araujo, Laurent Meunier, Rafael Pinot, Benjamin Negrevergne

Abstract: It has been empirically observed that defense mechanisms designed to protect neural networks against $\ell_\infty$ adversarial examples offer poor performance against $\ell_2$ adversarial examples and vice versa. In this paper we conduct a geometrical analysis that validates this observation. Then, we provide a number of empirical insights to illustrate the effect of this phenomenon in practice. T… ▽ More It has been empirically observed that defense mechanisms designed to protect neural networks against $\ell_\infty$ adversarial examples offer poor performance against $\ell_2$ adversarial examples and vice versa. In this paper we conduct a geometrical analysis that validates this observation. Then, we provide a number of empirical insights to illustrate the effect of this phenomenon in practice. Then, we review some of the existing defense mechanism that attempts to defend against multiple attacks by mixing defense strategies. Thanks to our numerical experiments, we discuss the relevance of this method and state open questions for the adversarial examples community. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: Workshop on Machine Learning for CyberSecurity (MLCS@ECML-PKDD)

arXiv:2009.12197 [pdf, other]

End-to-End Prediction of Parcel Delivery Time with Deep Learning for Smart-City Applications

Authors: Arthur Cruz de Araujo, Ali Etemad

Abstract: The acquisition of massive data on parcel delivery motivates postal operators to foster the development of predictive systems to improve customer service. Predicting delivery times successive to being shipped out of the final depot, referred to as last-mile prediction, deals with complicating factors such as traffic, drivers' behaviors, and weather. This work studies the use of deep learning for s… ▽ More The acquisition of massive data on parcel delivery motivates postal operators to foster the development of predictive systems to improve customer service. Predicting delivery times successive to being shipped out of the final depot, referred to as last-mile prediction, deals with complicating factors such as traffic, drivers' behaviors, and weather. This work studies the use of deep learning for solving a real-world case of last-mile parcel delivery time prediction. We present our solution under the IoT paradigm and discuss its feasibility on a cloud-based architecture as a smart city application. We focus on a large-scale parcel dataset provided by Canada Post, covering the Greater Toronto Area (GTA). We utilize an origin-destination (OD) formulation, in which routes are not available, but only the start and end delivery points. We investigate three categories of convolutional-based neural networks and assess their performances on the task. We further demonstrate how our modeling outperforms several baselines, from classical machine learning models to referenced OD solutions. Specifically, we show that a ResNet architecture with 8 residual blocks displays the best trade-off between performance and complexity. We perform a thorough error analysis across the data and visualize the deep features learned to better understand the model behavior, making interesting remarks on data predictability. Our work provides an end-to-end neural pipeline that leverages parcel OD data as well as weather to accurately predict delivery durations. We believe that our system has the potential not only to improve user experience by better modeling their anticipation but also to aid last-mile postal logistics as a whole. △ Less

Submitted 28 April, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

Comments: 14 pages, 21 figures, 9 tables

arXiv:2009.10181 [pdf, other]

doi 10.1109/ACCESS.2021.3077415

Towards Image-based Automatic Meter Reading in Unconstrained Scenarios: A Robust and Efficient Approach

Authors: Rayson Laroca, Alessandra B. Araujo, Luiz A. Zanlorensi, Eduardo C. de Almeida, David Menotti

Abstract: Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach fo… ▽ More Existing approaches for image-based Automatic Meter Reading (AMR) have been evaluated on images captured in well-controlled scenarios. However, real-world meter reading presents unconstrained scenarios that are way more challenging due to dirt, various lighting conditions, scale variations, in-plane and out-of-plane rotations, among other factors. In this work, we present an end-to-end approach for AMR focusing on unconstrained scenarios. Our main contribution is the insertion of a new stage in the AMR pipeline, called corner detection and counter classification, which enables the counter region to be rectified -- as well as the rejection of illegible/faulty meters -- prior to the recognition stage. We also introduce a publicly available dataset, called Copel-AMR, that contains 12,500 meter images acquired in the field by the service company's employees themselves, including 2,500 images of faulty meters or cases where the reading is illegible due to occlusions. Experimental evaluation demonstrates that the proposed system, which has three networks operating in a cascaded mode, outperforms all baselines in terms of recognition rate while still being quite efficient. Moreover, as very few reading errors are tolerated in real-world applications, we show that our AMR system achieves impressive recognition rates (i.e., > 99%) when rejecting readings made with lower confidence values. △ Less

Submitted 12 May, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Journal ref: IEEE Access, vol. 9, pp. 67569-67584, 2021

arXiv:2006.15624 [pdf, other]

Application of Statistical Methods in Software Engineering: Theory and Practice

Authors: T. F. M. Sirqueira, M. A. Miguel, H. L. O. Dalpra, M. A. P. Araujo, J. M. N. David

Abstract: The experimental evaluation of the methods and concepts covered in software engineering has been increasingly valued. This value indicates the constant search for new forms of assessment and validation of the results obtained in Software Engineering research. Results are validated in studies through evaluations, which in turn become increasingly stringent. As an alternative to aid in the verificat… ▽ More The experimental evaluation of the methods and concepts covered in software engineering has been increasingly valued. This value indicates the constant search for new forms of assessment and validation of the results obtained in Software Engineering research. Results are validated in studies through evaluations, which in turn become increasingly stringent. As an alternative to aid in the verification of the results, that is, whether they are positive or negative, we suggest the use of statistical methods. This article presents some of the main statistical techniques available, as well as their use in carrying out the implementation of data analysis in experimental studies in Software Engineering. This paper presents a practical approach proving statistical techniques through a decision tree, which was created in order to facilitate the understanding of the appropriate statistical method for each data analysis situation. Actual data from the software projects were employed to demonstrate the use of these statistical methods. Although it is not the aim of this work, basic experimentation and statistics concepts will be presented, as well as a concrete indication of the applicability of these techniques. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2006.08391 [pdf, other]

On Lipschitz Regularization of Convolutional Layers using Toeplitz Matrix Theory

Authors: Alexandre Araujo, Benjamin Negrevergne, Yann Chevaleyre, Jamal Atif

Abstract: This paper tackles the problem of Lipschitz regularization of Convolutional Neural Networks. Lipschitz regularity is now established as a key property of modern deep learning with implications in training stability, generalization, robustness against adversarial examples, etc. However, computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard. Recent attempts f… ▽ More This paper tackles the problem of Lipschitz regularization of Convolutional Neural Networks. Lipschitz regularity is now established as a key property of modern deep learning with implications in training stability, generalization, robustness against adversarial examples, etc. However, computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard. Recent attempts from the literature introduce upper bounds to approximate this constant that are either efficient but loose or accurate but computationally expensive. In this work, by leveraging the theory of Toeplitz matrices, we introduce a new upper bound for convolutional layers that is both tight and easy to compute. Based on this result we devise an algorithm to train Lipschitz regularized Convolutional Neural Networks. △ Less

Submitted 7 November, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Showing 1–50 of 87 results for author: Araujo, A