-
GCF: Graph Convolutional Networks for Facial Expression Recognition
Authors:
Hozaifa Kassab,
Mohamed Bahaa,
Ali Hamdi
Abstract:
Facial Expression Recognition (FER) is vital for understanding interpersonal communication. However, existing classification methods often face challenges such as vulnerability to noise, imbalanced datasets, overfitting, and generalization issues. In this paper, we propose GCF, a novel approach that utilizes Graph Convolutional Networks for FER. GCF integrates Convolutional Neural Networks (CNNs)…
▽ More
Facial Expression Recognition (FER) is vital for understanding interpersonal communication. However, existing classification methods often face challenges such as vulnerability to noise, imbalanced datasets, overfitting, and generalization issues. In this paper, we propose GCF, a novel approach that utilizes Graph Convolutional Networks for FER. GCF integrates Convolutional Neural Networks (CNNs) for feature extraction, using either custom architectures or pretrained models. The extracted visual features are then represented on a graph, enhancing local CNN features with global features via a Graph Convolutional Neural Network layer. We evaluate GCF on benchmark datasets including CK+, JAFFE, and FERG. The results show that GCF significantly improves performance over state-of-the-art methods. For example, GCF enhances the accuracy of ResNet18 from 92% to 98% on CK+, from 66% to 89% on JAFFE, and from 94% to 100% on FERG. Similarly, GCF improves the accuracy of VGG16 from 89% to 97% on CK+, from 72% to 92% on JAFFE, and from 96% to 99.49% on FERG. We provide a comprehensive analysis of our approach, demonstrating its effectiveness in capturing nuanced facial expressions. By integrating graph convolutions with CNNs, GCF significantly advances FER, offering improved accuracy and robustness in real-world applications.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
X-Diffusion: Generating Detailed 3D MRI Volumes From a Single Image Using Cross-Sectional Diffusion Models
Authors:
Emmanuelle Bourigault,
Abdullah Hamdi,
Amir Jamaludin
Abstract:
In this work, we present X-Diffusion, a cross-sectional diffusion model tailored for Magnetic Resonance Imaging (MRI) data. X-Diffusion is capable of generating the entire MRI volume from just a single MRI slice or optionally from few multiple slices, setting new benchmarks in the precision of synthesized MRIs from extremely sparse observations. The uniqueness lies in the novel view-conditional tr…
▽ More
In this work, we present X-Diffusion, a cross-sectional diffusion model tailored for Magnetic Resonance Imaging (MRI) data. X-Diffusion is capable of generating the entire MRI volume from just a single MRI slice or optionally from few multiple slices, setting new benchmarks in the precision of synthesized MRIs from extremely sparse observations. The uniqueness lies in the novel view-conditional training and inference of X-Diffusion on MRI volumes, allowing for generalized MRI learning. Our evaluations span both brain tumour MRIs from the BRATS dataset and full-body MRIs from the UK Biobank dataset. Utilizing the paired pre-registered Dual-energy X-ray Absorptiometry (DXA) and MRI modalities in the UK Biobank dataset, X-Diffusion is able to generate detailed 3D MRI volume from a single full-body DXA. Remarkably, the resultant MRIs not only stand out in precision on unseen examples (surpassing state-of-the-art results by large margins) but also flawlessly retain essential features of the original MRI, including tumour profiles, spine curvature, brain volume, and beyond. Furthermore, the trained X-Diffusion model on the MRI datasets attains a generalization capacity out-of-domain (e.g. generating knee MRIs even though it is trained on brains). The code is available on the project website https://emmanuelleb985.github.io/XDiffusion/ .
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Integrating SystemC-AMS Power Modeling with a RISC-V ISS for Virtual Prototy** of Battery-operated Embedded Devices
Authors:
Mohamed Amine Hamdi,
Giovanni Pollo,
Matteo Risso,
Germain Haugou,
Alessio Burrello,
Enrico Macii,
Massimo Poncino,
Sara Vinco,
Daniele Jahier Pagliari
Abstract:
RISC-V cores have gained a lot of popularity over the last few years. However, being quite a recent and novel technology, there is still a gap in the availability of comprehensive simulation frameworks for RISC-V that cover both the functional and extra-functional aspects. This gap hinders progress in the field, as fast yet accurate system-level simulation is crucial for Design Space Exploration (…
▽ More
RISC-V cores have gained a lot of popularity over the last few years. However, being quite a recent and novel technology, there is still a gap in the availability of comprehensive simulation frameworks for RISC-V that cover both the functional and extra-functional aspects. This gap hinders progress in the field, as fast yet accurate system-level simulation is crucial for Design Space Exploration (DSE).
This work presents an open-source framework designed to tackle this challenge, integrating functional RISC-V simulation (achieved with GVSoC) with SystemC-AMS (used to model extra-functional aspects, in detail power storage and distribution). The combination of GVSoC and SystemC-AMS in a single simulation framework allows to perform a DSE that is dependent on the mutual impact between functional and extra-functional aspects. In our experiments, we validate the framework's effectiveness by creating a virtual prototype of a compact, battery-powered embedded system.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
ASEM: Enhancing Empathy in Chatbot through Attention-based Sentiment and Emotion Modeling
Authors:
Omama Hamad,
Ali Hamdi,
Khaled Shaban
Abstract:
Effective feature representations play a critical role in enhancing the performance of text generation models that rely on deep neural networks. However, current approaches suffer from several drawbacks, such as the inability to capture the deep semantics of language and sensitivity to minor input variations, resulting in significant changes in the generated text. In this paper, we present a novel…
▽ More
Effective feature representations play a critical role in enhancing the performance of text generation models that rely on deep neural networks. However, current approaches suffer from several drawbacks, such as the inability to capture the deep semantics of language and sensitivity to minor input variations, resulting in significant changes in the generated text. In this paper, we present a novel solution to these challenges by employing a mixture of experts, multiple encoders, to offer distinct perspectives on the emotional state of the user's utterance while simultaneously enhancing performance. We propose an end-to-end model architecture called ASEM that performs emotion analysis on top of sentiment analysis for open-domain chatbots, enabling the generation of empathetic responses that are fluent and relevant. In contrast to traditional attention mechanisms, the proposed model employs a specialized attention strategy that uniquely zeroes in on sentiment and emotion nuances within the user's utterance. This ensures the generation of context-rich representations tailored to the underlying emotional tone and sentiment intricacies of the text. Our approach outperforms existing methods for generating empathetic embeddings, providing empathetic and diverse responses. The performance of our proposed model significantly exceeds that of existing models, enhancing emotion detection accuracy by 6.2% and lexical diversity by 1.4%.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Authors:
Abdullah Hamdi,
Luke Melas-Kyriazi,
**jie Mai,
Guocheng Qian,
Ruoshi Liu,
Carl Vondrick,
Bernard Ghanem,
Andrea Vedaldi
Abstract:
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represe…
▽ More
Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes.
It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (e.g. squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. The code is available on the project website https://abdullahamdi.com/ges .
△ Less
Submitted 24 May, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Authors:
Guocheng Qian,
**jie Mai,
Abdullah Hamdi,
Jian Ren,
Aliaksandr Siarohin,
Bing Li,
Hsin-Ying Lee,
Ivan Skorokhodov,
Peter Wonka,
Sergey Tulyakov,
Bernard Ghanem
Abstract:
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing…
▽ More
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.
△ Less
Submitted 23 July, 2023; v1 submitted 30 June, 2023;
originally announced June 2023.
-
Dipion and dikaon photoproduction in ultra-peripheral Pb-Pb collisions with ALICE
Authors:
Abdennacer Hamdi
Abstract:
High energy photons originating from the electromagnetic field of ultrarelativistic lead nuclei can interact with the other lead ion. These reactions are studied in the ultra-peripheral heavy ion collisions to probe the physics of strong interactions. The analysis of dipion and dikaon photoproduction was carried out using the ALICE 2015 Pb-Pb data at center-of-mass energy $\sqrt{s}~=~5.02$ TeV, an…
▽ More
High energy photons originating from the electromagnetic field of ultrarelativistic lead nuclei can interact with the other lead ion. These reactions are studied in the ultra-peripheral heavy ion collisions to probe the physics of strong interactions. The analysis of dipion and dikaon photoproduction was carried out using the ALICE 2015 Pb-Pb data at center-of-mass energy $\sqrt{s}~=~5.02$ TeV, and 2017 Xe-Xe data at $\sqrt{s}~=~5.44$ TeV. We present the $ρ^0$ meson and direct $π^{+}π^{-}$ measurements, as well as the prospects of studying exclusive $K^{+}K^{-}$ photoproduction, with sensitivity towards higher dikaon invariant mass above the $φ(1020)$ threshold.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Mindstorms in Natural Language-Based Societies of Mind
Authors:
Mingchen Zhuge,
Haozhe Liu,
Francesco Faccio,
Dylan R. Ashley,
Róbert Csordás,
Anand Gopalakrishnan,
Abdullah Hamdi,
Hasan Abed Al Kader Hammoud,
Vincent Herrmann,
Kazuki Irie,
Louis Kirsch,
Bing Li,
Guohao Li,
Shuming Liu,
**jie Mai,
Piotr Piękos,
Aditya Ramesh,
Imanol Schlag,
Weimin Shi,
Aleksandar Stanić,
Wenyi Wang,
Yuhui Wang,
Mengmeng Xu,
Deng-** Fan,
Bernard Ghanem
, et al. (1 additional authors not shown)
Abstract:
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco…
▽ More
Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overcome the limitations of single LLMs, improving multimodal zero-shot reasoning. In these natural language-based societies of mind (NLSOMs), new agents -- all communicating through the same universal symbolic language -- are easily added in a modular fashion. To demonstrate the power of NLSOMs, we assemble and experiment with several of them (having up to 129 members), leveraging mindstorms in them to solve some practical AI tasks: visual question answering, image captioning, text-to-image synthesis, 3D generation, egocentric retrieval, embodied AI, and general language-based task solving. We view this as a starting point towards much larger NLSOMs with billions of agents-some of which may be humans. And with this emergence of great societies of heterogeneous minds, many new research questions have suddenly become paramount to the future of artificial intelligence. What should be the social structure of an NLSOM? What would be the (dis)advantages of having a monarchical rather than a democratic structure? How can principles of NN economies be used to maximize the total reward of a reinforcement learning NLSOM? In this work, we identify, discuss, and try to answer some of these questions.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
VARS: Video Assistant Referee System for Automated Soccer Decision Making from Multiple Views
Authors:
Jan Held,
Anthony Cioppa,
Silvio Giancola,
Abdullah Hamdi,
Bernard Ghanem,
Marc Van Droogenbroeck
Abstract:
The Video Assistant Referee (VAR) has revolutionized association football, enabling referees to review incidents on the pitch, make informed decisions, and ensure fairness. However, due to the lack of referees in many countries and the high cost of the VAR infrastructure, only professional leagues can benefit from it. In this paper, we propose a Video Assistant Referee System (VARS) that can autom…
▽ More
The Video Assistant Referee (VAR) has revolutionized association football, enabling referees to review incidents on the pitch, make informed decisions, and ensure fairness. However, due to the lack of referees in many countries and the high cost of the VAR infrastructure, only professional leagues can benefit from it. In this paper, we propose a Video Assistant Referee System (VARS) that can automate soccer decision-making. VARS leverages the latest findings in multi-view video analysis, to provide real-time feedback to the referee, and help them make informed decisions that can impact the outcome of a game. To validate VARS, we introduce SoccerNet-MVFoul, a novel video dataset of soccer fouls from multiple camera views, annotated with extensive foul descriptions by a professional soccer referee, and we benchmark our VARS to automatically recognize the characteristics of these fouls. We believe that VARS has the potential to revolutionize soccer refereeing and take the game to new heights of fairness and accuracy across all levels of professional and amateur federations.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
Yes but.. Can ChatGPT Identify Entities in Historical Documents?
Authors:
Carlos-Emiliano González-Gallardo,
Emanuela Boros,
Nancy Girdhar,
Ahmed Hamdi,
Jose G. Moreno,
Antoine Doucet
Abstract:
Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing…
▽ More
Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing it in the named entity recognition and classification (NERC) task in primary sources (e.g., historical newspapers and classical commentaries) in a zero-shot manner and by comparing it with state-of-the-art LM-based systems. Our findings indicate several shortcomings in identifying entities in historical text that range from the consistency of entity annotation guidelines, entity complexity, and code-switching, to the specificity of prompting. Moreover, as expected, the inaccessibility of historical archives to the public (and thus on the Internet) also impacts its performance.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
DocILE Benchmark for Document Information Localization and Extraction
Authors:
Štěpán Šimsa,
Milan Šulc,
Michal Uřičář,
Yash Patel,
Ahmed Hamdi,
Matěj Kocián,
Matyáš Skalický,
Jiří Matas,
Antoine Doucet,
Mickaël Coustaty,
Dimosthenis Karatzas
Abstract:
This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific…
▽ More
This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific aspects, resulting in the following key features: (i) annotations in 55 classes, which surpasses the granularity of previously published key information extraction datasets by a large margin; (ii) Line Item Recognition represents a highly practical information extraction task, where key information has to be assigned to items in a table; (iii) documents come from numerous layouts and the test set includes zero- and few-shot cases as well as layouts commonly seen in the training set. The benchmark comes with several baselines, including RoBERTa, LayoutLMv3 and DETR-based Table Transformer; applied to both tasks of the DocILE benchmark, with results shared in this paper, offering a quick starting point for future work. The dataset, baselines and supplementary material are available at https://github.com/rossumai/docile.
△ Less
Submitted 3 May, 2023; v1 submitted 11 February, 2023;
originally announced February 2023.
-
DocILE 2023 Teaser: Document Information Localization and Extraction
Authors:
Štěpán Šimsa,
Milan Šulc,
Matyáš Skalický,
Yash Patel,
Ahmed Hamdi
Abstract:
The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpublished data due to the sensitive nature of such documents. Publicly available datasets are mostly small and domain-specific. The absence of a large-scale public dataset or benchmark hinders the repro…
▽ More
The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpublished data due to the sensitive nature of such documents. Publicly available datasets are mostly small and domain-specific. The absence of a large-scale public dataset or benchmark hinders the reproducibility and cross-evaluation of published methods. The DocILE 2023 competition, hosted as a lab at the CLEF 2023 conference and as an ICDAR 2023 competition, will run the first major benchmark for the tasks of Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR) from business documents. With thousands of annotated real documents from open sources, a hundred thousand of generated synthetic documents, and nearly a million unlabeled documents, the DocILE lab comes with the largest publicly available dataset for KILE and LIR. We are looking forward to contributions from the Computer Vision, Natural Language Processing, Information Retrieval, and other communities. The data, baselines, code and up-to-date information about the lab and competition are available at https://docile.rossum.ai/.
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
MVTN: Learning Multi-View Transformations for 3D Understanding
Authors:
Abdullah Hamdi,
Faisal AlZahrani,
Silvio Giancola,
Bernard Ghanem
Abstract:
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we p…
▽ More
Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
△ Less
Submitted 6 June, 2024; v1 submitted 27 December, 2022;
originally announced December 2022.
-
SPARF: Large-Scale Learning of 3D Sparse Radiance Fields from Few Input Images
Authors:
Abdullah Hamdi,
Bernard Ghanem,
Matthias Nießner
Abstract:
Recent advances in Neural Radiance Fields (NeRFs) treat the problem of novel view synthesis as Sparse Radiance Field (SRF) optimization using sparse voxels for efficient and fast rendering (plenoxels,InstantNGP). In order to leverage machine learning and adoption of SRFs as a 3D representation, we present SPARF, a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of…
▽ More
Recent advances in Neural Radiance Fields (NeRFs) treat the problem of novel view synthesis as Sparse Radiance Field (SRF) optimization using sparse voxels for efficient and fast rendering (plenoxels,InstantNGP). In order to leverage machine learning and adoption of SRFs as a 3D representation, we present SPARF, a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of $\sim$ 17 million images rendered from nearly 40,000 shapes at high resolution (400 X 400 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis and includes more than one million 3D-optimized radiance fields with multiple voxel resolutions. Furthermore, we propose a novel pipeline (SuRFNet) that learns to generate sparse voxel radiance fields from only few views. This is done by using the densely collected SPARF dataset and 3D sparse convolutions. SuRFNet employs partial SRFs from few/one images and a specialized SRF loss to learn to generate high-quality sparse voxel radiance fields that can be rendered from novel views. Our approach achieves state-of-the-art results in the task of unconstrained novel view synthesis based on few views on ShapeNet as compared to recent baselines. The SPARF dataset is made public with the code and models on the project website https://abdullahamdi.com/sparf/ .
△ Less
Submitted 21 August, 2023; v1 submitted 18 December, 2022;
originally announced December 2022.
-
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
Authors:
**jie Mai,
Abdullah Hamdi,
Silvio Giancola,
Chen Zhao,
Bernard Ghanem
Abstract:
With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object wi…
▽ More
With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory Benchmark proposed a task for Visual Queries with 3D Localization (VQ3D). Given an egocentric video clip and an image crop depicting a query object, the goal is to localize the 3D position of the center of that query object with respect to the camera pose of a query frame. Current methods tackle the problem of VQ3D by unprojecting the 2D localization results of the sibling task Visual Queries with 2D Localization (VQ2D) into 3D predictions. Yet, we point out that the low number of camera poses caused by camera re-localization from previous VQ3D methods severally hinders their overall success rate. In this work, we formalize a pipeline (we dub EgoLoc) that better entangles 3D multiview geometry with 2D object retrieval from egocentric videos. Our approach involves estimating more robust camera poses and aggregating multi-view 3D displacements by leveraging the 2D detection confidence, which enhances the success rate of object queries and leads to a significant improvement in the VQ3D baseline performance. Specifically, our approach achieves an overall success rate of up to 87.12%, which sets a new state-of-the-art result in the VQ3D task. We provide a comprehensive empirical analysis of the VQ3D task and existing solutions, and highlight the remaining challenges in VQ3D. The code is available at https://github.com/Wayne-Mai/EgoLoc.
△ Less
Submitted 28 August, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Hot and Cold QCD White Paper from ALICE-USA: Input for 2023 U.S. Long Range Plan for Nuclear Science
Authors:
N. Alizadehvandchali,
N. Apadula,
M. Arslandok,
C. Beattie,
R. Bellwied,
J. T. Blair,
F. Bock,
H. Bossi,
A. Bylinkin,
H. Caines,
I. Chakaberia,
M. Cherney,
T. M. Cormier,
R. Cruz-Torres,
P. Dhankher,
D. U. Dixit,
R. J. Ehlers,
W. Fan,
M. Fasel,
F. Flor,
A. N. Flores,
D. R. Gangadharan,
E. Garcia-Solis,
A. Gautam,
E. Glimos
, et al. (58 additional authors not shown)
Abstract:
The ALICE-USA collaboration presents its plans for the 2023 U.S. Long Range Plan for Nuclear Science.
The ALICE-USA collaboration presents its plans for the 2023 U.S. Long Range Plan for Nuclear Science.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Estimating more camera poses for ego-centric videos is essential for VQ3D
Authors:
**jie Mai,
Chen Zhao,
Abdullah Hamdi,
Silvio Giancola,
Bernard Ghanem
Abstract:
Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark. Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image, and the answer should be a 3D displacement vector pointing to object X. However, current techniques use naive ways to estimate the camera poses of video…
▽ More
Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark. Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image, and the answer should be a 3D displacement vector pointing to object X. However, current techniques use naive ways to estimate the camera poses of video frames, resulting in a low query with pose (QwP) ratio, thus a poor overall success rate. We design a new pipeline for the challenging egocentric video camera pose estimation problem in our work. Moreover, we revisit the current VQ3D framework and optimize it in terms of performance and efficiency. As a result, we get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.
△ Less
Submitted 18 November, 2022;
originally announced November 2022.
-
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
Authors:
Guocheng Qian,
Abdullah Hamdi,
Xingdi Zhang,
Bernard Ghanem
Abstract:
While Transformers have achieved impressive success in natural language processing and computer vision, their performance on 3D point clouds is relatively poor. This is mainly due to the limitation of Transformers: a demanding need for extensive training data. Unfortunately, in the realm of 3D point clouds, the availability of large datasets is a challenge, exacerbating the issue of training Trans…
▽ More
While Transformers have achieved impressive success in natural language processing and computer vision, their performance on 3D point clouds is relatively poor. This is mainly due to the limitation of Transformers: a demanding need for extensive training data. Unfortunately, in the realm of 3D point clouds, the availability of large datasets is a challenge, exacerbating the issue of training Transformers for 3D tasks. In this work, we solve the data issue of point cloud Transformers from two perspectives: (i) introducing more inductive bias to reduce the dependency of Transformers on data, and (ii) relying on cross-modality pretraining. More specifically, we first present Progressive Point Patch Embedding and present a new point cloud Transformer model namely PViT. PViT shares the same backbone as Transformer but is shown to be less hungry for data, enabling Transformer to achieve performance comparable to the state-of-the-art. Second, we formulate a simple yet effective pipeline dubbed "Pix4Point" that allows harnessing Transformers pretrained in the image domain to enhance downstream point cloud understanding. This is achieved through a modality-agnostic Transformer backbone with the help of a tokenizer and decoder specialized in the different domains. Pretrained on a large number of widely available images, significant gains of PViT are observed in the tasks of 3D point cloud classification, part segmentation, and semantic segmentation on ScanObjectNN, ShapeNetPart, and S3DIS, respectively. Our code and models are available at https://github.com/guochengqian/Pix4Point .
△ Less
Submitted 2 February, 2024; v1 submitted 25 August, 2022;
originally announced August 2022.
-
Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding
Authors:
Abdullah Hamdi,
Silvio Giancola,
Bernard Ghanem
Abstract:
Multi-view projection methods have demonstrated promising performance on 3D understanding tasks like 3D classification and segmentation. However, it remains unclear how to combine such multi-view methods with the widely available 3D point clouds. Previous methods use unlearned heuristics to combine features at the point level. To this end, we introduce the concept of the multi-view point cloud (Vo…
▽ More
Multi-view projection methods have demonstrated promising performance on 3D understanding tasks like 3D classification and segmentation. However, it remains unclear how to combine such multi-view methods with the widely available 3D point clouds. Previous methods use unlearned heuristics to combine features at the point level. To this end, we introduce the concept of the multi-view point cloud (Voint cloud), representing each 3D point as a set of features extracted from several view-points. This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation. Naturally, we can equip this new representation with convolutional and pooling operations. We deploy a Voint neural network (VointNet) to learn representations in the Voint space. Our novel representation achieves \sota performance on 3D classification, shape retrieval, and robust 3D part segmentation on standard benchmarks ( ScanObjectNN, ShapeNet Core55, and ShapeNet Parts).
△ Less
Submitted 25 January, 2023; v1 submitted 30 November, 2021;
originally announced November 2021.
-
GCCN: Global Context Convolutional Network
Authors:
Ali Hamdi,
Flora Salim,
Du Yong Kim
Abstract:
In this paper, we propose Global Context Convolutional Network (GCCN) for visual recognition. GCCN computes global features representing contextual information across image patches. These global contextual features are defined as local maxima pixels with high visual sharpness in each patch. These features are then concatenated and utilised to augment the convolutional features. The learnt feature…
▽ More
In this paper, we propose Global Context Convolutional Network (GCCN) for visual recognition. GCCN computes global features representing contextual information across image patches. These global contextual features are defined as local maxima pixels with high visual sharpness in each patch. These features are then concatenated and utilised to augment the convolutional features. The learnt feature vector is normalised using the global context features using Frobenius norm. This straightforward approach achieves high accuracy in compassion to the state-of-the-art methods with 94.6% and 95.41% on CIFAR-10 and STL-10 datasets, respectively. To explore potential impact of GCCN on other visual representation tasks, we implemented GCCN as a based model to few-shot image classification. We learn metric distances between the augmented feature vectors and their prototypes representations, similar to Prototypical and Matching Networks. GCCN outperforms state-of-the-art few-shot learning methods achieving 99.9%, 84.8% and 80.74% on Omniglot, MiniImageNet and CUB-200, respectively. GCCN has significantly improved on the accuracy of state-of-the-art prototypical and matching networks by up to 30% in different few-shot learning scenarios.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Signature-Graph Networks
Authors:
Ali Hamdi,
Flora Salim,
Du Yong Kim,
Xiaojun Chang
Abstract:
We propose a novel approach for visual representation learning called Signature-Graph Neural Networks (SGN). SGN learns latent global structures that augment the feature representation of Convolutional Neural Networks (CNN). SGN constructs unique undirected graphs for each image based on the CNN feature maps. The feature maps are partitioned into a set of equal and non-overlap** patches. The gra…
▽ More
We propose a novel approach for visual representation learning called Signature-Graph Neural Networks (SGN). SGN learns latent global structures that augment the feature representation of Convolutional Neural Networks (CNN). SGN constructs unique undirected graphs for each image based on the CNN feature maps. The feature maps are partitioned into a set of equal and non-overlap** patches. The graph nodes are located on high-contrast sharp convolution features with the local maxima or minima in these patches. The node embeddings are aggregated through novel Signature-Graphs based on horizontal and vertical edge connections. The representation vectors are then computed based on the spectral Laplacian eigenvalues of the graphs. SGN outperforms existing methods of recent graph convolutional networks, generative adversarial networks, and auto-encoders with image classification accuracy of 99.65% on ASIRRA, 99.91% on MNIST, 98.55% on Fashion-MNIST, 96.18% on CIFAR-10, 84.71% on CIFAR-100, 94.36% on STL10, and 95.86% on SVHN datasets. We also introduce a novel implementation of the state-of-the-art multi-head attention (MHA) on top of the proposed SGN. Adding SGN to MHA improved the image classification accuracy from 86.92% to 94.36% on the STL10 dataset
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Search for photoproduction of axion-like particles at GlueX
Authors:
GlueX Collaboration,
S. Adhikari,
C. S. Akondi,
M. Albrecht,
A. Ali,
M. Amaryan,
A. Asaturyan,
A. Austregesilo,
Z. Baldwin,
F. Barbosa,
J. Barlow,
E. Barriga,
R. Barsotti,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
E. Chudakov,
S. Cole,
P. L. Cole,
O. Cortes,
V. Crede
, et al. (120 additional authors not shown)
Abstract:
We present a search for axion-like particles, $a$, produced in photon-proton collisions at a center-of-mass energy of approximately 4 GeV, focusing on the scenario where the $a$-gluon coupling is dominant. The search uses $a\toγγ$ and $a\toπ^+π^-π^0$ decays, and a data sample corresponding to an integrated luminosity of 168 pb$^{-1}$ collected with the GlueX detector. The search for $a\toγγ$ decay…
▽ More
We present a search for axion-like particles, $a$, produced in photon-proton collisions at a center-of-mass energy of approximately 4 GeV, focusing on the scenario where the $a$-gluon coupling is dominant. The search uses $a\toγγ$ and $a\toπ^+π^-π^0$ decays, and a data sample corresponding to an integrated luminosity of 168 pb$^{-1}$ collected with the GlueX detector. The search for $a\toγγ$ decays is performed in the mass range of $180 < m_a < 480$ MeV, while the search for $a\toπ^+π^-π^0$ decays explores the $600 < m_a < 720$ MeV region. No evidence for a signal is found, and 90% confidence-level exclusion limits are placed on the $a$-gluon coupling strength. These constraints are the most stringent to date over much of the mass ranges considered.
△ Less
Submitted 24 March, 2022; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Named Entity Recognition and Classification on Historical Documents: A Survey
Authors:
Maud Ehrmann,
Ahmed Hamdi,
Elvys Linhares Pontes,
Matteo Romanello,
Antoine Doucet
Abstract:
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficientl…
▽ More
After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this 'big data of the past'. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Measurement of Spin Density Matrix Elements in $Λ(1520)$ Photoproduction at 8.2-8.8 GeV
Authors:
GlueX Collaboration,
S. Adhikari,
C. S. Akondi,
M. Albrecht,
A. Ali,
M. Amaryan,
A. Asaturyan,
A. Austregesilo,
Z. Baldwin,
F. Barbosa,
J. Barlow,
E. Barriga,
R. Barsotti,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
E. Chudakov,
S. Cole,
P. L. Cole,
O. Cortes,
V. Crede
, et al. (121 additional authors not shown)
Abstract:
We report on the measurement of spin density matrix elements of the $Λ(1520)$ in the photoproduction reaction $γp\rightarrow Λ(1520)K^+$, via its subsequent decay to $K^{-}p$. The measurement was performed as part of the GlueX experimental program in Hall D at Jefferson Lab using a linearly polarized photon beam with $E_γ=$ 8.2-8.8 GeV. These are the first such measurements in this photon energy r…
▽ More
We report on the measurement of spin density matrix elements of the $Λ(1520)$ in the photoproduction reaction $γp\rightarrow Λ(1520)K^+$, via its subsequent decay to $K^{-}p$. The measurement was performed as part of the GlueX experimental program in Hall D at Jefferson Lab using a linearly polarized photon beam with $E_γ=$ 8.2-8.8 GeV. These are the first such measurements in this photon energy range. Results are presented in bins of momentum transfer squared, $-(t-t_\text{0})$. We compare the results with a Reggeon exchange model and determine that natural exchange amplitudes are dominant in $Λ(1520)$ photoproduction.
△ Less
Submitted 3 March, 2022; v1 submitted 26 July, 2021;
originally announced July 2021.
-
MARL: Multimodal Attentional Representation Learning for Disease Prediction
Authors:
Ali Hamdi,
Amr Aboeleneen,
Khaled Shaban
Abstract:
Existing learning models often utilise CT-scan images to predict lung diseases. These models are posed by high uncertainties that affect lung segmentation and visual feature learning. We introduce MARL, a novel Multimodal Attentional Representation Learning model architecture that learns useful features from multimodal data under uncertainty. We feed the proposed model with both the lung CT-scan i…
▽ More
Existing learning models often utilise CT-scan images to predict lung diseases. These models are posed by high uncertainties that affect lung segmentation and visual feature learning. We introduce MARL, a novel Multimodal Attentional Representation Learning model architecture that learns useful features from multimodal data under uncertainty. We feed the proposed model with both the lung CT-scan images and their perspective historical patients' biological records collected over times. Such rich data offers to analyse both spatial and temporal aspects of the disease. MARL employs Fuzzy-based image spatial segmentation to overcome uncertainties in CT-scan images. We then utilise a pre-trained Convolutional Neural Network (CNN) to learn visual representation vectors from images. We augment patients' data with statistical features from the segmented images. We develop a Long Short-Term Memory (LSTM) network to represent the augmented data and learn sequential patterns of disease progressions. Finally, we inject both CNN and LSTM feature vectors to an attention layer to help focus on the best learning features. We evaluated MARL on regression of lung disease progression and status classification. MARL outperforms state-of-the-art CNN architectures, such as EfficientNet and DenseNet, and baseline prediction models. It achieves a 91% R^2 score, which is higher than the other models by a range of 8% to 27%. Also, MARL achieves 97% and 92% accuracy for binary and multi-class classification, respectively. MARL improves the accuracy of state-of-the-art CNN models with a range of 19% to 57%. The results show that combining spatial and sequential temporal features produces better discriminative feature.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Spatiotemporal Data Mining: A Survey on Challenges and Open Problems
Authors:
Ali Hamdi,
Khaled Shaban,
Abdelkarim Erradi,
Amr Mohamed,
Shakila Khan Rumi,
Flora Salim
Abstract:
Spatiotemporal data mining (STDM) discovers useful patterns from the dynamic interplay between space and time. Several available surveys capture STDM advances and report a wealth of important progress in this field. However, STDM challenges and problems are not thoroughly discussed and presented in articles of their own. We attempt to fill this gap by providing a comprehensive literature survey on…
▽ More
Spatiotemporal data mining (STDM) discovers useful patterns from the dynamic interplay between space and time. Several available surveys capture STDM advances and report a wealth of important progress in this field. However, STDM challenges and problems are not thoroughly discussed and presented in articles of their own. We attempt to fill this gap by providing a comprehensive literature survey on state-of-the-art advances in STDM. We describe the challenging issues and their causes and open gaps of multiple STDM directions and aspects. Specifically, we investigate the challenging issues in regards to spatiotemporal relationships, interdisciplinarity, discretisation, and data characteristics. Moreover, we discuss the limitations in the literature and open research problems related to spatiotemporal data representations, modelling and visualisation, and comprehensiveness of approaches. We explain issues related to STDM tasks of classification, clustering, hotspot detection, association and pattern mining, outlier detection, visualisation, visual analytics, and computer vision tasks. We also highlight STDM issues related to multiple applications including crime and public safety, traffic and transportation, earth and environment monitoring, epidemiology, social media, and Internet of Things.
△ Less
Submitted 31 March, 2021;
originally announced March 2021.
-
Drone-as-a-Service Composition Under Uncertainty
Authors:
Ali Hamdi,
Flora D. Salim,
Du Yong Kim,
Azadeh Ghari Neiat,
Athman Bouguettaya
Abstract:
We propose an uncertainty-aware service approach to provide drone-based delivery services called Drone-as-a-Service (DaaS) effectively. Specifically, we propose a service model of DaaS based on the dynamic spatiotemporal features of drones and their in-flight contexts. The proposed DaaS service approach consists of three components: scheduling, route-planning, and composition. First, we develop a…
▽ More
We propose an uncertainty-aware service approach to provide drone-based delivery services called Drone-as-a-Service (DaaS) effectively. Specifically, we propose a service model of DaaS based on the dynamic spatiotemporal features of drones and their in-flight contexts. The proposed DaaS service approach consists of three components: scheduling, route-planning, and composition. First, we develop a DaaS scheduling model to generate DaaS itineraries through a Skyway network. Second, we propose an uncertainty-aware DaaS route-planning algorithm that selects the optimal Skyways under weather uncertainties. Third, we develop two DaaS composition techniques to select an optimal DaaS composition at each station of the planned route. A spatiotemporal DaaS composer first selects the optimal DaaSs based on their spatiotemporal availability and drone capabilities. A predictive DaaS composer then utilises the outcome of the first composer to enable fast and accurate DaaS composition using several Machine Learning classification methods. We train the classifiers using a new set of spatiotemporal features which are in addition to other DaaS QoS properties. Our experiments results show the effectiveness and efficiency of the proposed approach.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
PANDA Phase One
Authors:
G. Barucca,
F. Davì,
G. Lancioni,
P. Mengucci,
L. Montalto,
P. P. Natali,
N. Paone,
D. Rinaldi,
L. Scalise,
B. Krusche,
M. Steinacher,
Z. Liu,
C. Liu,
B. Liu,
X. Shen,
S. Sun,
G. Zhao,
J. Zhao,
M. Albrecht,
W. Alkakhi,
S. Bökelmann,
S. Coen,
F. Feldbauer,
M. Fink,
J. Frech
, et al. (399 additional authors not shown)
Abstract:
The Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany, provides unique possibilities for a new generation of hadron-, nuclear- and atomic physics experiments. The future antiProton ANnihilations at DArmstadt (PANDA or $\overline{\rm P}$ANDA) experiment at FAIR will offer a broad physics programme, covering different aspects of the strong interaction. Understanding the latter in…
▽ More
The Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany, provides unique possibilities for a new generation of hadron-, nuclear- and atomic physics experiments. The future antiProton ANnihilations at DArmstadt (PANDA or $\overline{\rm P}$ANDA) experiment at FAIR will offer a broad physics programme, covering different aspects of the strong interaction. Understanding the latter in the non-perturbative regime remains one of the greatest challenges in contemporary physics. The antiproton-nucleon interaction studied with PANDA provides crucial tests in this area. Furthermore, the high-intensity, low-energy domain of PANDA allows for searches for physics beyond the Standard Model, e.g. through high precision symmetry tests. This paper takes into account a staged approach for the detector setup and for the delivered luminosity from the accelerator. The available detector setup at the time of the delivery of the first antiproton beams in the HESR storage ring is referred to as the \textit{Phase One} setup. The physics programme that is achievable during Phase One is outlined in this paper.
△ Less
Submitted 9 June, 2021; v1 submitted 28 January, 2021;
originally announced January 2021.
-
MVTN: Multi-View Transformation Network for 3D Shape Recognition
Authors:
Abdullah Hamdi,
Silvio Giancola,
Bernard Ghanem
Abstract:
Multi-view projection methods have demonstrated their ability to reach state-of-the-art performance on 3D shape recognition. Those methods learn different ways to aggregate information from multiple views. However, the camera view-points for those views tend to be heuristically set and fixed for all shapes. To circumvent the lack of dynamism of current multi-view methods, we propose to learn those…
▽ More
Multi-view projection methods have demonstrated their ability to reach state-of-the-art performance on 3D shape recognition. Those methods learn different ways to aggregate information from multiple views. However, the camera view-points for those views tend to be heuristically set and fixed for all shapes. To circumvent the lack of dynamism of current multi-view methods, we propose to learn those view-points. In particular, we introduce the Multi-View Transformation Network (MVTN) that regresses optimal view-points for 3D shape recognition, building upon advances in differentiable rendering. As a result, MVTN can be trained end-to-end along with any multi-view network for 3D shape classification. We integrate MVTN in a novel adaptive multi-view pipeline that can render either 3D meshes or point clouds. MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval without the need for extra training supervision. In these tasks, MVTN achieves state-of-the-art performance on ModelNet40, ShapeNet Core55, and the most recent and realistic ScanObjectNN dataset (up to 6% improvement). Interestingly, we also show that MVTN can provide network robustness against rotation and occlusion in the 3D domain. The code is available at https://github.com/ajhamdi/MVTN .
△ Less
Submitted 17 August, 2021; v1 submitted 26 November, 2020;
originally announced November 2020.
-
Measurement of beam asymmetry for $π^-Δ^{++}$ photoproduction on the proton at $E_γ$=8.5 GeV
Authors:
GlueX Collaboration,
S. Adhikari,
C. S. Akondi,
A. Ali,
M. Amaryan,
A. Asaturyan,
A. Austregesilo,
Z. Baldwin,
F. Barbosa,
J. Barlow,
E. Barriga,
R. Barsotti,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
B. E. Cannon,
E. Chudakov,
S. Cole,
O. Cortes,
V. Crede,
M. M. Dalton
, et al. (112 additional authors not shown)
Abstract:
We report a measurement of the $π^-$ photoproduction beam asymmetry for the reaction $\vecγ p \rightarrow π^- Δ^{++}$ using data from the GlueX experiment in the photon beam energy range 8.2--8.8 GeV. The asymmetry $Σ$ is measured as a function of four-momentum transfer $t$ to the $Δ^{++}$ and compared to phenomenological models. We find that $Σ$ varies as a function of $t$: negative at smaller va…
▽ More
We report a measurement of the $π^-$ photoproduction beam asymmetry for the reaction $\vecγ p \rightarrow π^- Δ^{++}$ using data from the GlueX experiment in the photon beam energy range 8.2--8.8 GeV. The asymmetry $Σ$ is measured as a function of four-momentum transfer $t$ to the $Δ^{++}$ and compared to phenomenological models. We find that $Σ$ varies as a function of $t$: negative at smaller values and positive at higher values of $|t|$. The reaction can be described theoretically by $t$-channel particle exchange requiring pseudoscalar, vector, and tensor intermediaries. In particular, this reaction requires charge exchange, allowing us to probe pion exchange and the significance of higher-order corrections to one-pion exchange at low momentum transfer. Constraining production mechanisms of conventional mesons may aid in the search for and study of unconventional mesons. This is the first measurement of the process at this energy.
△ Less
Submitted 8 January, 2021; v1 submitted 15 September, 2020;
originally announced September 2020.
-
flexgrid2vec: Learning Efficient Visual Representations Vectors
Authors:
Ali Hamdi,
Du Yong Kim,
Flora D. Salim
Abstract:
We propose flexgrid2vec, a novel approach for image representation learning. Existing visual representation methods suffer from several issues, including the need for highly intensive computation, the risk of losing in-depth structural information and the specificity of the method to certain shapes or objects. flexgrid2vec converts an image to a low-dimensional feature vector. We represent each im…
▽ More
We propose flexgrid2vec, a novel approach for image representation learning. Existing visual representation methods suffer from several issues, including the need for highly intensive computation, the risk of losing in-depth structural information and the specificity of the method to certain shapes or objects. flexgrid2vec converts an image to a low-dimensional feature vector. We represent each image with a graph of flexible, unique node locations and edge distances. flexgrid2vec is a multi-channel GCN that learns features of the most representative image patches. We have investigated both spectral and non-spectral implementations of the GCN node-embedding. Specifically, we have implemented flexgrid2vec based on different node-aggregation methods, such as vector summation, concatenation and normalisation with eigenvector centrality. We compare the performance of flexgrid2vec with a set of state-of-the-art visual representation learning models on binary and multi-class image classification tasks. Although we utilise imbalanced, low-size and low-resolution datasets, flexgrid2vec shows stable and outstanding results against well-known base classifiers. flexgrid2vec achieves 96.23% on CIFAR-10, 83.05% on CIFAR-100, 94.50% on STL-10, 98.8% on ASIRRA and 89.69% on the COCO dataset.
△ Less
Submitted 29 September, 2021; v1 submitted 30 July, 2020;
originally announced July 2020.
-
The GlueX Beamline and Detector
Authors:
S. Adhikari,
C. S. Akondi,
H. Al Ghoul,
A. Ali,
M. Amaryan,
E. G. Anassontzis,
A. Austregesilo,
F. Barbosa,
J. Barlow,
A. Barnes,
E. Barriga,
R. Barsotti,
T. D. Beattie,
J. Benesch,
V. V. Berdnikov,
G. Biallas,
T. Black,
W. Boeglin,
P. Brindza,
W. J. Briscoe,
T. Britton,
J. Brock,
W. K. Brooks,
B. E. Cannon,
C. Carlin
, et al. (165 additional authors not shown)
Abstract:
The GlueX experiment at Jefferson Lab has been designed to study photoproduction reactions with a 9-GeV linearly polarized photon beam. The energy and arrival time of beam photons are tagged using a scintillator hodoscope and a scintillating fiber array. The photon flux is determined using a pair spectrometer, while the linear polarization of the photon beam is determined using a polarimeter based…
▽ More
The GlueX experiment at Jefferson Lab has been designed to study photoproduction reactions with a 9-GeV linearly polarized photon beam. The energy and arrival time of beam photons are tagged using a scintillator hodoscope and a scintillating fiber array. The photon flux is determined using a pair spectrometer, while the linear polarization of the photon beam is determined using a polarimeter based on triplet photoproduction. Charged-particle tracks from interactions in the central target are analyzed in a solenoidal field using a central straw-tube drift chamber and six packages of planar chambers with cathode strips and drift wires. Electromagnetic showers are reconstructed in a cylindrical scintillating fiber calorimeter inside the magnet and a lead-glass array downstream. Charged particle identification is achieved by measuring energy loss in the wire chambers and using the flight time of particles between the target and detectors outside the magnet. The signals from all detectors are recorded with flash ADCs and/or pipeline TDCs into memories allowing trigger decisions with a latency of 3.3 $μ$s. The detector operates routinely at trigger rates of 40 kHz and data rates of 600 megabytes per second. We describe the photon beam, the GlueX detector components, electronics, data-acquisition and monitoring systems, and the performance of the experiment during the first three years of operation.
△ Less
Submitted 26 October, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
DroTrack: High-speed Drone-based Object Tracking Under Uncertainty
Authors:
Ali Hamdi,
Flora Salim,
Du Yong Kim
Abstract:
We present DroTrack, a high-speed visual single-object tracking framework for drone-captured video sequences. Most of the existing object tracking methods are designed to tackle well-known challenges, such as occlusion and cluttered backgrounds. The complex motion of drones, i.e., multiple degrees of freedom in three-dimensional space, causes high uncertainty. The uncertainty problem leads to inac…
▽ More
We present DroTrack, a high-speed visual single-object tracking framework for drone-captured video sequences. Most of the existing object tracking methods are designed to tackle well-known challenges, such as occlusion and cluttered backgrounds. The complex motion of drones, i.e., multiple degrees of freedom in three-dimensional space, causes high uncertainty. The uncertainty problem leads to inaccurate location predictions and fuzziness in scale estimations. DroTrack solves such issues by discovering the dependency between object representation and motion geometry. We implement an effective object segmentation based on Fuzzy C Means (FCM). We incorporate the spatial information into the membership function to cluster the most discriminative segments. We then enhance the object segmentation by using a pre-trained Convolution Neural Network (CNN) model. DroTrack also leverages the geometrical angular motion to estimate a reliable object scale. We discuss the experimental results and performance evaluation using two datasets of 51,462 drone-captured frames. The combination of the FCM segmentation and the angular scaling increased DroTrack precision by up to $9\%$ and decreased the centre location error by $162$ pixels on average. DroTrack outperforms all the high-speed trackers and achieves comparable results in comparison to deep learning trackers. DroTrack offers high frame rates up to 1000 frame per second (fps) with the best location precision, more than a set of state-of-the-art real-time trackers.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Measurement of the Photon Beam Asymmetry in $\vecγ p\to K^+Σ^0$ at $E_γ = 8.5$ GeV
Authors:
The GlueX Collaboration,
S. Adhikari,
A. Ali,
M. Amaryan,
A. Austregesilo,
F. Barbosa,
J. Barlow,
E. Barriga,
R. Barsotti,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
B. E. Cannon,
N. Cao,
E. Chudakov,
S. Cole,
O. Cortes,
V. Crede,
M. M. Dalton,
T. Daniels,
A. Deur
, et al. (102 additional authors not shown)
Abstract:
We report measurements of the photon beam asymmetry $Σ$ for the reaction $\vecγ p\to K^+Σ^0$(1193) using the GlueX spectrometer in Hall D at Jefferson Lab. Data were collected using a linearly polarized photon beam in the energy range of 8.2-8.8 GeV incident on a liquid hydrogen target. The beam asymmetry $Σ$ was measured as a function of the Mandelstam variable $t$, and a single value of $Σ$ was…
▽ More
We report measurements of the photon beam asymmetry $Σ$ for the reaction $\vecγ p\to K^+Σ^0$(1193) using the GlueX spectrometer in Hall D at Jefferson Lab. Data were collected using a linearly polarized photon beam in the energy range of 8.2-8.8 GeV incident on a liquid hydrogen target. The beam asymmetry $Σ$ was measured as a function of the Mandelstam variable $t$, and a single value of $Σ$ was extracted for events produced in the $u$-channel. These are the first exclusive measurements of the photon beam asymmetry $Σ$ for the reaction in this energy range. For the $t$-channel, the measured beam asymmetry is close to unity over the $t$-range studied, $-t=(0.1-1.4)~$(GeV/$c$)$^{2}$, with an average value of $Σ= 1.00\pm 0.05$. This agrees with theoretical models that describe the reaction via the natural-parity exchange of the $K^{*}$(892) Regge trajectory. A value of $Σ= 0.41 \pm 0.09$ is obtained for the $u$-channel integrated up to $-u=2.0$~(GeV/$c$)$^{2}$.
△ Less
Submitted 12 May, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
Technical Design Report for the PANDA Endcap Disc DIRC
Authors:
Panda Collaboration,
F. Davi,
W. Erni,
B. Krusche,
M. Steinacher,
N. Walford,
H. Liu,
Z. Liu,
B. Liu,
X. Shen,
C. Wang,
J. Zhao,
M. Albrecht,
T. Erlen,
F. Feldbauer,
M. Fink,
V. Freudenreich,
M. Fritsch,
F. H. Heinsius,
T. Held,
T. Holtmann,
I. Keshk,
H. Koch,
B. Kopf,
M. Kuhlmann
, et al. (441 additional authors not shown)
Abstract:
PANDA (anti-Proton ANnihiliation at DArmstadt) is planned to be one of the four main experiments at the future international accelerator complex FAIR (Facility for Antiproton and Ion Research) in Darmstadt, Germany. It is going to address fundamental questions of hadron physics and quantum chromodynamics using cooled antiproton beams with a high intensity and and momenta between 1.5 and 15 GeV/c.…
▽ More
PANDA (anti-Proton ANnihiliation at DArmstadt) is planned to be one of the four main experiments at the future international accelerator complex FAIR (Facility for Antiproton and Ion Research) in Darmstadt, Germany. It is going to address fundamental questions of hadron physics and quantum chromodynamics using cooled antiproton beams with a high intensity and and momenta between 1.5 and 15 GeV/c. PANDA is designed to reach a maximum luminosity of 2x10^32 cm^2 s. Most of the physics programs require an excellent particle identification (PID). The PID of hadronic states at the forward endcap of the target spectrometer will be done by a fast and compact Cherenkov detector that uses the detection of internally reflected Cherenkov light (DIRC) principle. It is designed to cover the polar angle range from 5° to 22° and to provide a separation power for the separation of charged pions and kaons up to 3 standard deviations (s.d.) for particle momenta up to 4 GeV/c in order to cover the important particle phase space. This document describes the technical design and the expected performance of the novel PANDA Disc DIRC detector that has not been used in any other high energy physics experiment (HEP) before. The performance has been studied with Monte-Carlo simulations and various beam tests at DESY and CERN. The final design meets all PANDA requirements and guarantees suffcient safety margins.
△ Less
Submitted 29 December, 2019;
originally announced December 2019.
-
Width-k Eulerian polynomials of type A and B and its Gamma-positivity
Authors:
Marwa Ben Abdelmaksoud,
Adel Hamdi
Abstract:
We define some generalizations of the classical descent and inversion statistics on signed permutations that arise from the work of Sack and Ulfarsson [20] and called after width-k descents and width-k inversionsof type A in Davis's work [8]. Using the aforementioned new statistics, we derive some new generalizations of Eulerian polynomials of type A, B and D. It should also be noticed that we est…
▽ More
We define some generalizations of the classical descent and inversion statistics on signed permutations that arise from the work of Sack and Ulfarsson [20] and called after width-k descents and width-k inversionsof type A in Davis's work [8]. Using the aforementioned new statistics, we derive some new generalizations of Eulerian polynomials of type A, B and D. It should also be noticed that we establish the Gamma-positivity of the "width-k" Eulerian polynomials and we give a combinatorial interpretation of finite sequences associated to these new polynomials using quasisymmetric functions and P-partition in Petersen's work [18].
△ Less
Submitted 10 May, 2022; v1 submitted 18 December, 2019;
originally announced December 2019.
-
AdvPC: Transferable Adversarial Perturbations on 3D Point Clouds
Authors:
Abdullah Hamdi,
Sara Rojas,
Ali Thabet,
Bernard Ghanem
Abstract:
Deep neural networks are vulnerable to adversarial attacks, in which imperceptible perturbations to their input lead to erroneous network predictions. This phenomenon has been extensively studied in the image domain, and has only recently been extended to 3D point clouds. In this work, we present novel data-driven adversarial attacks against 3D point cloud networks. We aim to address the following…
▽ More
Deep neural networks are vulnerable to adversarial attacks, in which imperceptible perturbations to their input lead to erroneous network predictions. This phenomenon has been extensively studied in the image domain, and has only recently been extended to 3D point clouds. In this work, we present novel data-driven adversarial attacks against 3D point cloud networks. We aim to address the following problems in current 3D point cloud adversarial attacks: they do not transfer well between different networks, and they are easy to defend against via simple statistical methods. To this extent, we develop a new point cloud attack (dubbed AdvPC) that exploits the input data distribution by adding an adversarial loss, after Auto-Encoder reconstruction, to the objective it optimizes. AdvPC leads to perturbations that are resilient against current defenses, while remaining highly transferable compared to state-of-the-art attacks. We test AdvPC using four popular point cloud networks: PointNet, PointNet++ (MSG and SSG), and DGCNN. Our proposed attack increases the attack success rate by up to 40% for those transferred to unseen networks (transferability), while maintaining a high success rate on the attacked network. AdvPC also increases the ability to break defenses by up to 38% as compared to other baselines on the ModelNet40 dataset.
△ Less
Submitted 16 July, 2020; v1 submitted 1 December, 2019;
originally announced December 2019.
-
Search for exotic states in photoproduction at GlueX
Authors:
Abdennacer Hamdi
Abstract:
Quantum Chromodynamics (QCD) is the theory that describes how hadrons are built from quarks and gluons via the strong interaction. Many predictions have been experimentally confirmed, but others remain under experimental investigation. Of particular interest is how gluonic excitations give rise to states with constituent glue. One class of such states are hybrid mesons that are predicted by theore…
▽ More
Quantum Chromodynamics (QCD) is the theory that describes how hadrons are built from quarks and gluons via the strong interaction. Many predictions have been experimentally confirmed, but others remain under experimental investigation. Of particular interest is how gluonic excitations give rise to states with constituent glue. One class of such states are hybrid mesons that are predicted by theoretical models and Lattice QCD calculations. Searching for and understanding the nature of these states is a primary physics goal of the GlueX experiment at the CEBAF accelerator at Jefferson Lab in the US. We will give an overview of the experiment, and present the status of the search for a hybrid meson candidate, Y (2175). This work is supported by HGS-HIRe.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
A New Formula of q-Fubini Numbers via Goncharov polynomials
Authors:
Adel Hamdi
Abstract:
Connected the generalized Goncharov polynomials associated to a pair ($\partial,\mathcal{Z}$) if a delta operator $\partial$ and an interpolation grid $\mathcal{Z}$, introduced by Lorentz, Tringali and Yan in [7], with the theory of binomial enumeration and order statistics, a new $q$-deformed of these polynomials given in this paper allows us to derive a new combinatorial formula of $q$-Fubini nu…
▽ More
Connected the generalized Goncharov polynomials associated to a pair ($\partial,\mathcal{Z}$) if a delta operator $\partial$ and an interpolation grid $\mathcal{Z}$, introduced by Lorentz, Tringali and Yan in [7], with the theory of binomial enumeration and order statistics, a new $q$-deformed of these polynomials given in this paper allows us to derive a new combinatorial formula of $q$-Fubini numbers. A combinatorial proof and some nice algebraic and analytic properties have been expanded to the $q$-deformed version.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Beam Asymmetry $\mathbfΣ$ for the Photoproduction of $\mathbfη$ and $\mathbf{η^{\prime}}$ Mesons at $\mathbf{E_γ=8.8}$GeV
Authors:
The GlueX Collaboration,
S. Adhikari,
A. Ali,
M. Amaryan,
A. Austregesilo,
F. Barbosa,
J. Barlow,
A. Barnes,
E. Barriga,
R. Barsotti,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
M. Boer,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
B. E. Cannon,
N. Cao,
E. Chudakov,
S. Cole,
O. Cortes,
V. Crede,
M. M. Dalton
, et al. (109 additional authors not shown)
Abstract:
We report on the measurement of the beam asymmetry $Σ$ for the reactions $\vecγp\rightarrow pη$ and $\vecγp \rightarrow pη^{\prime}$ from the GlueX experiment, using an 8.2--8.8 GeV linearly polarized tagged photon beam incident on a liquid hydrogen target in Hall D at Jefferson Lab. These measurements are made as a function of momentum transfer $-t$, with significantly higher statistical precisio…
▽ More
We report on the measurement of the beam asymmetry $Σ$ for the reactions $\vecγp\rightarrow pη$ and $\vecγp \rightarrow pη^{\prime}$ from the GlueX experiment, using an 8.2--8.8 GeV linearly polarized tagged photon beam incident on a liquid hydrogen target in Hall D at Jefferson Lab. These measurements are made as a function of momentum transfer $-t$, with significantly higher statistical precision than our earlier $η$ measurements, and are the first measurements of $η^{\prime}$ in this energy range. We compare the results to theoretical predictions based on $t$--channel quasi-particle exchange. We also compare the ratio of $Σ_η$ to $Σ_{η^{\prime}}$ to these models, as this ratio is predicted to be sensitive to the amount of $s\bar{s}$ exchange in the production. We find that photoproduction of both $η$ and $η^{\prime}$ is dominated by natural parity exchange with little dependence on $-t$.
△ Less
Submitted 24 November, 2019; v1 submitted 15 August, 2019;
originally announced August 2019.
-
Expected Tight Bounds for Robust Training
Authors:
Salman Alsubaihi,
Adel Bibi,
Modar Alfadly,
Abdullah Hamdi,
Bernard Ghanem
Abstract:
Training Deep Neural Networks that are robust to norm bounded adversarial attacks remains an elusive problem. While exact and inexact verification-based methods are generally too expensive to train large networks, it was demonstrated that bounded input intervals can be inexpensively propagated from a layer to another through deep networks. This interval bound propagation approach (IBP) not only ha…
▽ More
Training Deep Neural Networks that are robust to norm bounded adversarial attacks remains an elusive problem. While exact and inexact verification-based methods are generally too expensive to train large networks, it was demonstrated that bounded input intervals can be inexpensively propagated from a layer to another through deep networks. This interval bound propagation approach (IBP) not only has improved both robustness and certified accuracy but was the first to be employed on large/deep networks. However, due to the very loose nature of the IBP bounds, the required training procedure is complex and involved. In this paper, we closely examine the bounds of a block of layers composed in the form of Affine-ReLU-Affine. To this end, we propose expected tight bounds (true bounds in expectation), referred to as ETB, which are provably tighter than IBP bounds in expectation. We then extend this result to deeper networks through blockwise propagation and show that we can achieve orders of magnitudes tighter bounds compared to IBP. Furthermore, using a simple standard training procedure, we can achieve impressive robustness-accuracy trade-off on both MNIST and CIFAR10.
△ Less
Submitted 12 June, 2021; v1 submitted 28 May, 2019;
originally announced May 2019.
-
First measurement of near-threshold J/$ψ$ exclusive photoproduction off the proton
Authors:
The GlueX Collaboration,
A. Ali,
M. Amaryan,
E. G. Anassontzis,
A. Austregesilo,
M. Baalouch,
F. Barbosa,
J. Barlow,
A. Barnes,
E. Barriga,
T. D. Beattie,
V. V. Berdnikov,
T. Black,
W. Boeglin,
M. Boer,
W. J. Briscoe,
T. Britton,
W. K. Brooks,
B. E. Cannon,
N. Cao,
E. Chudakov,
S. Cole,
O. Cortes,
V. Crede,
M. M. Dalton
, et al. (110 additional authors not shown)
Abstract:
We report on the measurement of the $γp \rightarrow J/ψp$ cross section from $E_γ= 11.8$ GeV down to the threshold at $8.2$ GeV using a tagged photon beam with the GlueX experiment. We find the total cross section falls toward the threshold less steeply than expected from two-gluon exchange models. The differential cross section $dσ/dt$ has an exponential slope of $1.67 \pm 0.39$ GeV$^{-2}$ at…
▽ More
We report on the measurement of the $γp \rightarrow J/ψp$ cross section from $E_γ= 11.8$ GeV down to the threshold at $8.2$ GeV using a tagged photon beam with the GlueX experiment. We find the total cross section falls toward the threshold less steeply than expected from two-gluon exchange models. The differential cross section $dσ/dt$ has an exponential slope of $1.67 \pm 0.39$ GeV$^{-2}$ at $10.7$ GeV average energy. The LHCb pentaquark candidates $P_c^+$ can be produced in the $s$-channel of this reaction. We see no evidence for them and set model-dependent upper limits on their branching fractions $\mathcal{B}(P_c^+ \rightarrow J/ψp)$ and cross sections $σ(γp \to P_c^+)\times\mathcal{B}(P_c^+ \to J/ψp) $.
△ Less
Submitted 10 September, 2019; v1 submitted 26 May, 2019;
originally announced May 2019.
-
IAN: Combining Generative Adversarial Networks for Imaginative Face Generation
Authors:
Abdullah Hamdi,
Bernard Ghanem
Abstract:
Generative Adversarial Networks (GANs) have gained momentum for their ability to model image distributions. They learn to emulate the training set and that enables sampling from that domain and using the knowledge learned for useful applications. Several methods proposed enhancing GANs, including regularizing the loss with some feature matching. We seek to push GANs beyond the data in the training…
▽ More
Generative Adversarial Networks (GANs) have gained momentum for their ability to model image distributions. They learn to emulate the training set and that enables sampling from that domain and using the knowledge learned for useful applications. Several methods proposed enhancing GANs, including regularizing the loss with some feature matching. We seek to push GANs beyond the data in the training and try to explore unseen territory in the image manifold. We first propose a new regularizer for GAN based on K-nearest neighbor (K-NN) selective feature matching to a target set Y in high-level feature space, during the adversarial training of GAN on the base set X, and we call this novel model K-GAN. We show that minimizing the added term follows from cross-entropy minimization between the distributions of GAN and the set Y. Then, We introduce a cascaded framework for GANs that try to address the task of imagining a new distribution that combines the base set X and target set Y by cascading sampling GANs with translation GANs, and we dub the cascade of such GANs as the Imaginative Adversarial Network (IAN). We conduct an objective and subjective evaluation for different IAN setups in the addressed task and show some useful applications for these IANs, like manifold traversing and creative face generation for characters' design in movies or video games.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Towards Analyzing Semantic Robustness of Deep Neural Networks
Authors:
Abdullah Hamdi,
Bernard Ghanem
Abstract:
Despite the impressive performance of Deep Neural Networks (DNNs) on various vision tasks, they still exhibit erroneous high sensitivity toward semantic primitives (e.g. object pose). We propose a theoretically grounded analysis for DNN robustness in the semantic space. We qualitatively analyze different DNNs' semantic robustness by visualizing the DNN global behavior as semantic maps and observe…
▽ More
Despite the impressive performance of Deep Neural Networks (DNNs) on various vision tasks, they still exhibit erroneous high sensitivity toward semantic primitives (e.g. object pose). We propose a theoretically grounded analysis for DNN robustness in the semantic space. We qualitatively analyze different DNNs' semantic robustness by visualizing the DNN global behavior as semantic maps and observe interesting behavior of some DNNs. Since generating these semantic maps does not scale well with the dimensionality of the semantic space, we develop a bottom-up approach to detect robust regions of DNNs. To achieve this, we formalize the problem of finding robust semantic regions of the network as optimizing integral bounds and we develop expressions for update directions of the region bounds. We use our developed formulations to quantitatively evaluate the semantic robustness of different popular network architectures. We show through extensive experimentation that several networks, while trained on the same dataset and enjoying comparable accuracy, do not necessarily perform similarly in semantic robustness. For example, InceptionV3 is more accurate despite being less semantically robust than ResNet50. We hope that this tool will serve as a milestone towards understanding the semantic robustness of DNNs.
△ Less
Submitted 8 September, 2020; v1 submitted 9 April, 2019;
originally announced April 2019.
-
SADA: Semantic Adversarial Diagnostic Attacks for Autonomous Applications
Authors:
Abdullah Hamdi,
Matthias Müller,
Bernard Ghanem
Abstract:
One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attack…
▽ More
One major factor impeding more widespread adoption of deep neural networks (DNNs) is their lack of robustness, which is essential for safety-critical applications such as autonomous driving. This has motivated much recent work on adversarial attacks for DNNs, which mostly focus on pixel-level perturbations void of semantic meaning. In contrast, we present a general framework for adversarial attacks on trained agents, which covers semantic perturbations to the environment of the agent performing the task as well as pixel-level attacks. To do this, we re-frame the adversarial attack problem as learning a distribution of parameters that always fools the agent. In the semantic case, our proposed adversary (denoted as BBGAN) is trained to sample parameters that describe the environment with which the black-box agent interacts, such that the agent performs its dedicated task poorly in this environment. We apply BBGAN on three different tasks, primarily targeting aspects of autonomous navigation: object detection, self-driving, and autonomous UAV racing. On these tasks, BBGAN can generate failure cases that consistently fool a trained agent.
△ Less
Submitted 2 December, 2019; v1 submitted 5 December, 2018;
originally announced December 2018.
-
Inverse source problem in a forced network
Authors:
J. G. Caputo,
A. Hamdi,
A. Knippel
Abstract:
We address the nonlinear inverse source problem of identifying a time-dependent source occurring in one node of a network governed by a wave equation. We prove that time records of the associated state taken at a strategic set of two nodes yield uniqueness of the two unknown elements: the source position and the emitted signal. We develop a non-iterative identification method that localizes the so…
▽ More
We address the nonlinear inverse source problem of identifying a time-dependent source occurring in one node of a network governed by a wave equation. We prove that time records of the associated state taken at a strategic set of two nodes yield uniqueness of the two unknown elements: the source position and the emitted signal. We develop a non-iterative identification method that localizes the source node by solving a set of well posed linear systems. Once the source node is localized, we identify the emitted signal using a deconvolution problem or a Fourier expansion. Numerical experiments on a $5$ node graph confirm the effectiveness of the approach.
△ Less
Submitted 17 September, 2018; v1 submitted 13 March, 2018;
originally announced March 2018.
-
Learning Rotation for Kernel Correlation Filter
Authors:
Abdullah Hamdi,
Bernard Ghanem
Abstract:
Kernel Correlation Filters have shown a very promising scheme for visual tracking in terms of speed and accuracy on several benchmarks. However it suffers from problems that affect its performance like occlusion, rotation and scale change. This paper tries to tackle the problem of rotation by reformulating the optimization problem for learning the correlation filter. This modification (RKCF) inclu…
▽ More
Kernel Correlation Filters have shown a very promising scheme for visual tracking in terms of speed and accuracy on several benchmarks. However it suffers from problems that affect its performance like occlusion, rotation and scale change. This paper tries to tackle the problem of rotation by reformulating the optimization problem for learning the correlation filter. This modification (RKCF) includes learning rotation filter that utilizes circulant structure of HOG feature to guesstimate rotation from one frame to another and enhance the detection of KCF. Hence it gains boost in overall accuracy in many of OBT50 detest videos with minimal additional computation.
△ Less
Submitted 11 August, 2017;
originally announced August 2017.
-
Self-Interference in Full-Duplex Multi-User MIMO Channels
Authors:
Arman Shojaeifard,
Kai-Kit Wong,
Marco Di Renzo,
Gan Zheng,
Khairi Ashour Hamdi,
Jie Tang
Abstract:
We consider a multi-user multiple-input multiple-output (MIMO) setup where full-duplex (FD) multi-antenna nodes apply linear beamformers to simultaneously transmit and receive multiple streams over Rician fading channels. The exact first and second positive moments of the residual self-interference (SI), involving the squared norm of a sum of non-identically distributed random variables, are deriv…
▽ More
We consider a multi-user multiple-input multiple-output (MIMO) setup where full-duplex (FD) multi-antenna nodes apply linear beamformers to simultaneously transmit and receive multiple streams over Rician fading channels. The exact first and second positive moments of the residual self-interference (SI), involving the squared norm of a sum of non-identically distributed random variables, are derived in closed-form. The method of moments is hence invoked to provide a Gamma approximation for the residual SI distribution. The proposed theorem holds under arbitrary linear precoder/decoder design, number of antennas and streams, and SI cancellation capability.
△ Less
Submitted 1 January, 2017;
originally announced January 2017.
-
Massive MIMO-Enabled Full-Duplex Cellular Networks
Authors:
Arman Shojaeifard,
Kai-Kit Wong,
Marco Di Renzo,
Gan Zheng,
Khairi Ashour Hamdi,
Jie Tang
Abstract:
In this paper, we provide a theoretical framework for the study of massive multiple-input multiple-output (MIMO)-enabled full-duplex (FD) cellular networks in which the self-interference (SI) channels follow the Rician distribution and other channels are Rayleigh distributed. To facilitate bi-directional wireless functionality, we adopt (i) a downlink (DL) linear zero-forcing with self-interferenc…
▽ More
In this paper, we provide a theoretical framework for the study of massive multiple-input multiple-output (MIMO)-enabled full-duplex (FD) cellular networks in which the self-interference (SI) channels follow the Rician distribution and other channels are Rayleigh distributed. To facilitate bi-directional wireless functionality, we adopt (i) a downlink (DL) linear zero-forcing with self-interference-nulling (ZF-SIN) precoding scheme at the FD base stations (BSs), and (ii) an uplink (UL) self-interference-aware (SIA) fractional power control mechanism at the FD user equipments (UEs). Linear ZF receivers are further utilized for signal detection in the UL. The results indicate that the UL rate bottleneck in the baseline FD single-antenna system can be overcome via exploiting massive MIMO. On the other hand, the findings may be viewed as a reality-check, since we show that, under state-of-the-art system parameters, the spectral efficiency (SE) gain of FD massive MIMO over its half-duplex (HD) counterpart largely depends on the SI cancellation capability of the UEs. In addition, the anticipated two-fold increase in SE is shown to be only achievable with an infinitely large number of antennas.
△ Less
Submitted 4 March, 2017; v1 submitted 11 November, 2016;
originally announced November 2016.
-
Energy-Efficient Heterogeneous Cellular Networks with Spectrum Underlay and Overlay Access
Authors:
Jie Tang,
Daniel K. C. So,
Emad Alsusa,
Khairi Ashour Hamdi,
Arman Shojaeifard,
Kai-Kit Wong
Abstract:
In this paper, we provide joint subcarrier assignment and power allocation schemes for quality-of-service (QoS)-constrained energy-efficiency (EE) optimization in the downlink of an orthogonal frequency division multiple access (OFDMA)-based two-tier heterogeneous cellular network (HCN). Considering underlay transmission, where spectrum-efficiency (SE) is fully exploited, the EE solution involves…
▽ More
In this paper, we provide joint subcarrier assignment and power allocation schemes for quality-of-service (QoS)-constrained energy-efficiency (EE) optimization in the downlink of an orthogonal frequency division multiple access (OFDMA)-based two-tier heterogeneous cellular network (HCN). Considering underlay transmission, where spectrum-efficiency (SE) is fully exploited, the EE solution involves tackling a complex mixed-combinatorial and non-convex optimization problem. With appropriate decomposition of the original problem and leveraging on the quasi-concavity of the EE function, we propose a dual-layer resource allocation approach and provide a complete solution using difference-of-two-concave-functions approximation, successive convex approximation, and gradient-search methods. On the other hand, the inherent inter-tier interference from spectrum underlay access may degrade EE particularly under dense small-cell deployment and large bandwidth utilization. We therefore develop a novel resource allocation approach based on the concepts of spectrum overlay access and resource efficiency (RE) (normalized EE-SE trade-off). Specifically, the optimization procedure is separated in this case such that the macro-cell optimal RE and corresponding bandwidth is first determined, then the EE of small-cells utilizing the remaining spectrum is maximized. Simulation results confirm the theoretical findings and demonstrate that the proposed resource allocation schemes can approach the optimal EE with each strategy being superior under certain system settings.
△ Less
Submitted 30 October, 2016;
originally announced October 2016.