Search | arXiv e-print repository

Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation

Authors: Elham Amin Mansour, Ozan Unal, Suman Saha, Benjamin Bejar, Luc Van Gool

Abstract: The increasing relevance of panoptic segmentation is tied to the advancements in autonomous driving and AR/VR applications. However, the deployment of such models has been limited due to the expensive nature of dense data annotation, giving rise to unsupervised domain adaptation (UDA). A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domai… ▽ More The increasing relevance of panoptic segmentation is tied to the advancements in autonomous driving and AR/VR applications. However, the deployment of such models has been limited due to the expensive nature of dense data annotation, giving rise to unsupervised domain adaptation (UDA). A key challenge in panoptic UDA is reducing the domain gap between a labeled source and an unlabeled target domain while harmonizing the subtasks of semantic and instance segmentation to limit catastrophic interference. While considerable progress has been achieved, existing approaches mainly focus on the adaptation of semantic segmentation. In this work, we focus on incorporating instance-level adaptation via a novel instance-aware cross-domain mixing strategy IMix. IMix significantly enhances the panoptic quality by improving instance segmentation performance. Specifically, we propose inserting high-confidence predicted instances from the target domain onto source images, retaining the exhaustiveness of the resulting pseudo-labels while reducing the injected confirmation bias. Nevertheless, such an enhancement comes at the cost of degraded semantic performance, attributed to catastrophic forgetting. To mitigate this issue, we regularize our semantic branch by employing CLIP-based domain alignment (CDA), exploiting the domain-robustness of natural language prompts. Finally, we present an end-to-end model incorporating these two mechanisms called LIDAPS, achieving state-of-the-art results on all popular panoptic UDA benchmarks. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.05752 [pdf, other]

Task-Oriented GNNs Training on Large Knowledge Graphs for Accurate and Efficient Modeling

Authors: Hussein Abdallah, Waleed Afandi, Panos Kalnis, Essam Mansour

Abstract: A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG's size, density, and the number of node and edge types. AI practitioners handcraft… ▽ More A Knowledge Graph (KG) is a heterogeneous graph encompassing a diverse range of node and edge types. Heterogeneous Graph Neural Networks (HGNNs) are popular for training machine learning tasks like node classification and link prediction on KGs. However, HGNN methods exhibit excessive complexity influenced by the KG's size, density, and the number of node and edge types. AI practitioners handcraft a subgraph of a KG G relevant to a specific task. We refer to this subgraph as a task-oriented subgraph (TOSG), which contains a subset of task-related node and edge types in G. Training the task using TOSG instead of G alleviates the excessive computation required for a large KG. Crafting the TOSG demands a deep understanding of the KG's structure and the task's objectives. Hence, it is challenging and time-consuming. This paper proposes KG-TOSA, an approach to automate the TOSG extraction for task-oriented HGNN training on a large KG. In KG-TOSA, we define a generic graph pattern that captures the KG's local and global structure relevant to a specific task. We explore different techniques to extract subgraphs matching our graph pattern: namely (i) two techniques sampling around targeted nodes using biased random walk or influence scores, and (ii) a SPARQL-based extraction method leveraging RDF engines' built-in indices. Hence, it achieves negligible preprocessing overhead compared to the sampling techniques. We develop a benchmark of real KGs of large sizes and various tasks for node classification and link prediction. Our experiments show that KG-TOSA helps state-of-the-art HGNN methods reduce training time and memory usage by up to 70% while improving the model performance, e.g., accuracy and inference time. △ Less

Submitted 22 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 12 pages,9 Figures, 3 Tables, ICDE:2024

arXiv:2401.05957 [pdf]

doi 10.1039/D3CP03640E

Surface do** of rubrene single crystals by molecular electron donors and acceptors

Authors: Christos Gatsios, Andreas Opitz, Dominique Lungwitz, Ahmed E. Mansour, Thorsten Schultz, Dongguen Shin, Sebastian Hammer, Jens Pflaum, Yadong Zhang, Stephen Barlow, Seth R. Marder, Norbert Koch

Abstract: The surface molecular do** of organic semiconductors can play an important role in the development of organic electronic or optoelectronic devices. Single-crystal rubrene remains a leading molecular candidate for applications in electronics due to its high hole mobility. In parallel, intensive research into the fabrication of flexible organic electronics requires the careful design of functional… ▽ More The surface molecular do** of organic semiconductors can play an important role in the development of organic electronic or optoelectronic devices. Single-crystal rubrene remains a leading molecular candidate for applications in electronics due to its high hole mobility. In parallel, intensive research into the fabrication of flexible organic electronics requires the careful design of functional interfaces to enable optimal device characteristics. To this end, the present work seeks to understand the effect of surface molecular do** on the electronic band structure of rubrene single crystals. Our angle-resolved photoemission measurements reveal that the Fermi level moves in the band gap of rubrene depending on the direction of surface electron-transfer reactions with the molecular dopants, yet the valence band dispersion remains essentially unperturbed. This indicates that surface electron-transfer do** of a molecular single crystal can effectively modify the near-surface charge density, while retaining good charge-carrier mobility. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 28 pages, 11 figures

Journal ref: Physical Chemistry Chemical Physics, 25, 2023, 29718-29726

arXiv:2311.08871 [pdf, other]

A short note on super-hedging an arbitrary number of European options with integer-valued strategies

Authors: Dorsaf Cherif, Meriam El Mansour, Emmanuel Lepinette

Abstract: The usual theory of asset pricing in finance assumes that the financial strategies, i.e. the quantity of risky assets to invest, are real-valued so that they are not integer-valued in general, see the Black and Scholes model for instance. This is clearly contrary to what it is possible to do in the real world. Surprisingly, it seems that there is no many contributions in that direction in the lite… ▽ More The usual theory of asset pricing in finance assumes that the financial strategies, i.e. the quantity of risky assets to invest, are real-valued so that they are not integer-valued in general, see the Black and Scholes model for instance. This is clearly contrary to what it is possible to do in the real world. Surprisingly, it seems that there is no many contributions in that direction in the literature, except for a finite number of states. In this paper, for arbitrary Ω, we show that, in discrete-time, it is possible to evaluate the minimal super-hedging price when we restrict ourselves to integer-valued strategies. To do so, we only consider terminal claims that are continuous piecewise affine functions of the underlying asset. We formulate a dynamic programming principle that can be directly implemented on an historical data and which also provides the optimal integer-valued strategy. The problem with general payoffs remains open but should be solved with the same approach. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.08847 [pdf, other]

Robust discrete-time super-hedging strategies under AIP condition and under price uncertainty

Authors: Meriam El Mansour, Emmanuel Lepinette

Abstract: We solve the problem of super-hedging European or Asian options for discrete-time financial market models where executable prices are uncertain. The risky asset prices are not described by single-valued processes but measurable selections of random sets that allows to consider a large variety of models including bid-ask models with order books, but also models with a delay in the execution of the… ▽ More We solve the problem of super-hedging European or Asian options for discrete-time financial market models where executable prices are uncertain. The risky asset prices are not described by single-valued processes but measurable selections of random sets that allows to consider a large variety of models including bid-ask models with order books, but also models with a delay in the execution of the orders. We provide a numerical procedure to compute the infimum price under a weak no-arbitrage condition, the so-called AIP condition, under which the prices of the non negative European options are non negative. This condition is weaker than the existence of a risk-neutral martingale measure but it is sufficient to numerically solve the super-hedging problem. We illustrate our method by a numerical example. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.02749 [pdf, other]

Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking

Authors: Elham Amin Mansour, Hehui Zheng, Robert K. Katzschmann

Abstract: The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes a… ▽ More The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud. The reconstruction of meshes from point clouds has long been studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however, both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture. Our trained model can perform mesh reconstruction and tracking at a rate of 58Hz on a template mesh of 3000 vertices and a deformed point cloud of 5000 points and is generalizable to the deformations of six different object categories which are assumed to be made of soft material in our experiments (scissors, hammer, foam brick, cleanser bottle, orange, and dice). The object meshes are taken from the YCB benchmark dataset. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity of our method can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our trained model to generalize beyond six object categories and additionally to real-world deforming point clouds. △ Less

Submitted 26 March, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: 8 pages with appendix,16 figures

arXiv:2303.09199 [pdf, other]

A Generative Model for Digital Camera Noise Synthesis

Authors: Mingyang Song, Yang Zhang, Tunç O. Aydın, Elham Amin Mansour, Christopher Schroers

Abstract: Noise synthesis is a challenging low-level vision task aiming to generate realistic noise given a clean image along with the camera settings. To this end, we propose an effective generative model which utilizes clean features as guidance followed by noise injections into the network. Specifically, our generator follows a UNet-like structure with skip connections but without downsampling and upsamp… ▽ More Noise synthesis is a challenging low-level vision task aiming to generate realistic noise given a clean image along with the camera settings. To this end, we propose an effective generative model which utilizes clean features as guidance followed by noise injections into the network. Specifically, our generator follows a UNet-like structure with skip connections but without downsampling and upsampling layers. Firstly, we extract deep features from a clean image as the guidance and concatenate a Gaussian noise map to the transition point between the encoder and decoder as the noise source. Secondly, we propose noise synthesis blocks in the decoder in each of which we inject Gaussian noise to model the noise characteristics. Thirdly, we propose to utilize an additional Style Loss and demonstrate that this allows better noise characteristics supervision in the generator. Through a number of new experiments, we evaluate the temporal variance and the spatial correlation of the generated noise which we hope can provide meaningful insights for future works. Finally, we show that our proposed approach outperforms existing methods for synthesizing camera noise. △ Less

Submitted 13 June, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.02204 [pdf, other]

KGLiDS: A Platform for Semantic Abstraction, Linking, and Automation of Data Science

Authors: Mossad Helali, Niki Monjazeb, Shubham Vashisth, Philippe Carrier, Ahmed Helal, Antonio Cavalcante, Khaled Ammar, Katja Hose, Essam Mansour

Abstract: In recent years, we have witnessed the growing interest from academia and industry in applying data science technologies to analyze large amounts of data. In this process, a myriad of artifacts (datasets, pipeline scripts, etc.) are created. However, there has been no systematic attempt to holistically collect and exploit all the knowledge and experiences that are implicitly contained in those art… ▽ More In recent years, we have witnessed the growing interest from academia and industry in applying data science technologies to analyze large amounts of data. In this process, a myriad of artifacts (datasets, pipeline scripts, etc.) are created. However, there has been no systematic attempt to holistically collect and exploit all the knowledge and experiences that are implicitly contained in those artifacts. Instead, data scientists recover information and expertise from colleagues or learn via trial and error. Hence, this paper presents a scalable platform, KGLiDS, that employs machine learning and knowledge graph technologies to abstract and capture the semantics of data science artifacts and their connections. Based on this information, KGLiDS enables various downstream applications, such as data discovery and pipeline automation. Our comprehensive evaluation covers use cases in data discovery, data cleaning, transformation, and AutoML. It shows that KGLiDS is significantly faster with a lower memory footprint than the state-of-the-art systems while achieving comparable or better accuracy. △ Less

Submitted 12 June, 2024; v1 submitted 3 March, 2023; originally announced March 2023.

Comments: 15 pages, 9 figures

arXiv:2303.02166 [pdf, other]

Towards a GML-Enabled Knowledge Graph Platform

Authors: Hussein Abdallah, Essam Mansour

Abstract: This vision paper proposes KGNet, an on-demand graph machine learning (GML) as a service on top of RDF engines to support GML-enabled SPARQL queries. KGNet automates the training of GML models on a KG by identifying a task-specific subgraph. This helps reduce the task-irrelevant KG structure and properties for better scalability and accuracy. While training a GML model on KG, KGNet collects metada… ▽ More This vision paper proposes KGNet, an on-demand graph machine learning (GML) as a service on top of RDF engines to support GML-enabled SPARQL queries. KGNet automates the training of GML models on a KG by identifying a task-specific subgraph. This helps reduce the task-irrelevant KG structure and properties for better scalability and accuracy. While training a GML model on KG, KGNet collects metadata of trained models in the form of an RDF graph called KGMeta, which is interlinked with the relevant subgraphs in KG. Finally, all trained models are accessible via a SPARQL-like query. We call it a GML-enabled query and refer to it as SPARQLML. KGNet supports SPARQLML on top of existing RDF engines as an interface for querying and inferencing over KGs using GML models. The development of KGNet poses research opportunities in several areas, including meta-sampling for identifying task-specific subgraphs, GML pipeline automation with computational constraints, such as limited time and memory budget, and SPARQLML query optimization. KGNet supports different GML tasks, such as node classification, link prediction, and semantic entity matching. We evaluated KGNet using two real KGs of different application domains. Compared to training on the entire KG, KGNet significantly reduced training time and memory usage while maintaining comparable or improved accuracy. The KGNet source-code is available for further study △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 9 pages, 15 figures, accepted at ICDE 2023

Journal ref: https://icde2023.ics.uci.edu/research-sessions/#special-session-2

arXiv:2303.00595 [pdf, other]

A Universal Question-Answering Platform for Knowledge Graphs

Authors: Reham Omar, Ishika Dhall, Panos Kalnis, Essam Mansour

Abstract: Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questi… ▽ More Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training. △ Less

Submitted 8 August, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: The paper is accepted to SIGMOD 2023

arXiv:2302.06466 [pdf, ps, other]

ChatGPT versus Traditional Question Answering for Knowledge Graphs: Current Status and Future Directions Towards Knowledge Graph Chatbots

Authors: Reham Omar, Omij Mangukiya, Panos Kalnis, Essam Mansour

Abstract: Conversational AI and Question-Answering systems (QASs) for knowledge graphs (KGs) are both emerging research areas: they empower users with natural language interfaces for extracting information easily and effectively. Conversational AI simulates conversations with humans; however, it is limited by the data captured in the training datasets. In contrast, QASs retrieve the most recent information… ▽ More Conversational AI and Question-Answering systems (QASs) for knowledge graphs (KGs) are both emerging research areas: they empower users with natural language interfaces for extracting information easily and effectively. Conversational AI simulates conversations with humans; however, it is limited by the data captured in the training datasets. In contrast, QASs retrieve the most recent information from a KG by understanding and translating the natural language question into a formal query supported by the database engine. In this paper, we present a comprehensive study of the characteristics of the existing alternatives towards combining both worlds into novel KG chatbots. Our framework compares two representative conversational models, ChatGPT and Galactica, against KGQAN, the current state-of-the-art QAS. We conduct a thorough evaluation using four real KGs across various application domains to identify the current limitations of each category of systems. Based on our findings, we propose open research opportunities to empower QASs with chatbot capabilities for KGs. All benchmarks and all raw results are available1 for further analysis. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 9 pages

arXiv:2301.05108 [pdf, other]

Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning

Authors: Wenting Zhao, Ibrahim Abdelaziz, Julian Dolby, Kavitha Srinivas, Mossad Helali, Essam Mansour

Abstract: Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we… ▽ More Dynamically typed languages such as Python have become very popular. Among other strengths, Python's dynamic nature and its straightforward linking to native code have made it the de-facto language for many research areas such as Artificial Intelligence. This flexibility, however, makes static analysis very hard. While creating a sound, or a soundy, analysis for Python remains an open problem, we present in this work Serenity, a framework for static analysis of Python that turns out to be sufficient for some tasks. The Serenity framework exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries, to generate an abstraction of the code. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning. In these two applications, we demonstrate that such analysis has a strong signal, and can be leveraged to establish state-of-the-art performance, comparable to neural models and dynamic analysis respectively. △ Less

Submitted 4 January, 2023; originally announced January 2023.

arXiv:2207.11427 [pdf, other]

doi 10.1088/1748-0221/19/05/P05066

The FASER Detector

Authors: FASER Collaboration, Henso Abreu, Elham Amin Mansour, Claire Antel, Akitaka Ariga, Tomoko Ariga, Florian Bernlochner, Tobias Boeckh, Jamie Boyd, Lydia Brenner, Franck Cadoux, David W. Casper, Charlotte Cavanagh, Xin Chen, Andrea Coccaro, Olivier Crespo-Lopez, Stephane Debieux, Monica D'Onofrio, Liam Dougherty, Candan Dozen, Abdallah Ezzat, Yannick Favre, Deion Fellers, Jonathan L. Feng, Didier Ferrere , et al. (72 additional authors not shown)

Abstract: FASER, the ForwArd Search ExpeRiment, is an experiment dedicated to searching for light, extremely weakly-interacting particles at CERN's Large Hadron Collider (LHC). Such particles may be produced in the very forward direction of the LHC's high-energy collisions and then decay to visible particles inside the FASER detector, which is placed 480 m downstream of the ATLAS interaction point, aligned… ▽ More FASER, the ForwArd Search ExpeRiment, is an experiment dedicated to searching for light, extremely weakly-interacting particles at CERN's Large Hadron Collider (LHC). Such particles may be produced in the very forward direction of the LHC's high-energy collisions and then decay to visible particles inside the FASER detector, which is placed 480 m downstream of the ATLAS interaction point, aligned with the beam collisions axis. FASER also includes a sub-detector, FASER$ν$, designed to detect neutrinos produced in the LHC collisions and to study their properties. In this paper, each component of the FASER detector is described in detail, as well as the installation of the experiment system and its commissioning using cosmic-rays collected in September 2021 and during the LHC pilot beam test carried out in October 2021. FASER will start taking LHC collision data in 2022, and will run throughout LHC Run 3. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: 92 pages, 72 Figures

Report number: CERN-FASER-2022-001

Journal ref: JINST 19 (2024) P05066

arXiv:2112.14311 [pdf, ps, other]

Finite and High-temperature series expansion via many-body perturbation theory

Authors: Mohamed Amine Tag, Abid Boudiar, Mohamed El-Hadi Mansour, Abdelkader Hafdallah, Chafia Bendjeroudib

Abstract: We present a new algorithm to evaluate the grand potential at finite and high-temperature series expansion via many-body perturbation theory. This algorithm allows us to formulate each order as a divided difference. Further, we apply this algorithm to the Heisenberg spin-1/2 XXZ chain. We obtain all coefficients of the high-temperature expansion of the free energy and susceptibility per site of th… ▽ More We present a new algorithm to evaluate the grand potential at finite and high-temperature series expansion via many-body perturbation theory. This algorithm allows us to formulate each order as a divided difference. Further, we apply this algorithm to the Heisenberg spin-1/2 XXZ chain. We obtain all coefficients of the high-temperature expansion of the free energy and susceptibility per site of this model up to sixth order. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: 8 pages, 6 figures

arXiv:2111.13186 [pdf, other]

Federated Data Science to Break Down Silos [Vision]

Authors: Essam Mansour, Kavitha Srinivas, Katja Hose

Abstract: Similar to Open Data initiatives, data science as a community has launched initiatives for sharing not only data but entire pipelines, derivatives, artifacts, etc. (Open Data Science). However, the few efforts that exist focus on the technical part on how to facilitate sharing, conversion, etc. This vision paper goes a step further and proposes KEK, an open federated data science platform that doe… ▽ More Similar to Open Data initiatives, data science as a community has launched initiatives for sharing not only data but entire pipelines, derivatives, artifacts, etc. (Open Data Science). However, the few efforts that exist focus on the technical part on how to facilitate sharing, conversion, etc. This vision paper goes a step further and proposes KEK, an open federated data science platform that does not only allow for sharing data science pipelines and their (meta)data but also provides methods for efficient search and, in the ideal case, even allows for combining and defining pipelines across platforms in a federated manner. In doing so, KEK addresses the so far neglected challenge of actually finding artifacts that are semantically related and that can be combined to achieve a certain goal. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: Accepted at SIGMOD Record

arXiv:2111.00083 [pdf, other]

A Scalable AutoML Approach Based on Graph Neural Networks

Authors: Mossad Helali, Essam Mansour, Ibrahim Abdelaziz, Julian Dolby, Kavitha Srinivas

Abstract: AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines b… ▽ More AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 126 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems. △ Less

Submitted 14 July, 2022; v1 submitted 29 October, 2021; originally announced November 2021.

Comments: 14 pages, 9 figures. Accepted in VLDB22

arXiv:2110.15186 [pdf, other]

doi 10.1088/1748-0221/16/12/P12028

The trigger and data acquisition system of the FASER experiment

Authors: FASER Collaboration, Henso Abreu, Elham Amin Mansour, Claire Antel, Akitaka Ariga, Tomoko Ariga, Florian Bernlochner, Tobias Boeckh, Jamie Boyd, Lydia Brenner, Franck Cadoux, David Casper, Charlotte Cavanagh, Xin Chen, Andrea Coccaro, Stephane Debieux, Sergey Dmitrievsky, Monica D'Onofrio, Candan Dozen, Yannick Favre, Deion Fellers, Jonathan L. Feng, Didier Ferrere, Enrico Gamberini, Edward Karl Galantay , et al. (59 additional authors not shown)

Abstract: The FASER experiment is a new small and inexpensive experiment that is placed 480 meters downstream of the ATLAS experiment at the CERN LHC. FASER is designed to capture decays of new long-lived particles, produced outside of the ATLAS detector acceptance. These rare particles can decay in the FASER detector together with about 500-1000 Hz of other particles originating from the ATLAS interaction… ▽ More The FASER experiment is a new small and inexpensive experiment that is placed 480 meters downstream of the ATLAS experiment at the CERN LHC. FASER is designed to capture decays of new long-lived particles, produced outside of the ATLAS detector acceptance. These rare particles can decay in the FASER detector together with about 500-1000 Hz of other particles originating from the ATLAS interaction point. A very high efficiency trigger and data acquisition system is required to ensure that the physics events of interest will be recorded. This paper describes the trigger and data acquisition system of the FASER experiment and presents performance results of the system acquired during initial commissioning. △ Less

Submitted 10 January, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

Journal ref: 2021_JINST_16_P12028

arXiv:1109.6884 [pdf, other]

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

Authors: Essam Mansour, Amin Allam, Spiros Skiadopoulos, Panos Kalnis

Abstract: The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree con… ▽ More The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer. △ Less

Submitted 30 September, 2011; originally announced September 2011.

Comments: VLDB2012

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 1, pp. 49-60 (2011)

Showing 1–18 of 18 results for author: Mansour, E