-
Artificial Intelligence for the Electron Ion Collider (AI4EIC)
Authors:
C. Allaire,
R. Ammendola,
E. -C. Aschenauer,
M. Balandat,
M. Battaglieri,
J. Bernauer,
M. Bondì,
N. Branson,
T. Britton,
A. Butter,
I. Chahrour,
P. Chatagnon,
E. Cisbani,
E. W. Cline,
S. Dash,
C. Dean,
W. Deconinck,
A. Deshpande,
M. Diefenthaler,
R. Ent,
C. Fanelli,
M. Finger,
M. Finger, Jr.,
E. Fol,
S. Furletov
, et al. (70 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took…
▽ More
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Domain Specific Sub-network for Multi-Domain Neural Machine Translation
Authors:
Amr Hendy,
Mohamed Abdelghaffar,
Mohamed Afify,
Ahmed Y. Tawfik
Abstract:
This paper presents Domain-Specific Sub-network (DoSS). It uses a set of masks obtained through pruning to define a sub-network for each domain and finetunes the sub-network parameters on domain data. This performs very closely and drastically reduces the number of parameters compared to finetuning the whole network on each domain. Also a method to make masks unique per domain is proposed and show…
▽ More
This paper presents Domain-Specific Sub-network (DoSS). It uses a set of masks obtained through pruning to define a sub-network for each domain and finetunes the sub-network parameters on domain data. This performs very closely and drastically reduces the number of parameters compared to finetuning the whole network on each domain. Also a method to make masks unique per domain is proposed and shown to greatly improve the generalization to unseen domains. In our experiments on German to English machine translation the proposed method outperforms the strong baseline of continue training on multi-domain (medical, tech and religion) data by 1.47 BLEU points. Also continue training DoSS on new domain (legal) outperforms the multi-domain (medical, tech, religion, legal) baseline by 1.52 BLEU points.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Naturally-meaningful and efficient descriptors: machine learning of material properties based on robust one-shot ab initio descriptors
Authors:
Sherif Abdulkader Tawfik,
Salvy P. Russo
Abstract:
Establishing a data-driven pipeline for the discovery of novel materials requires the engineering of material features that can be feasibly calculated and can be applied to predict a material's target properties. Here we propose a new class of descriptors for describing crystal structures, which we term Robust One-Shot Ab initio (ROSA) descriptors. ROSA is computationally cheap and is shown to acc…
▽ More
Establishing a data-driven pipeline for the discovery of novel materials requires the engineering of material features that can be feasibly calculated and can be applied to predict a material's target properties. Here we propose a new class of descriptors for describing crystal structures, which we term Robust One-Shot Ab initio (ROSA) descriptors. ROSA is computationally cheap and is shown to accurately predict a range of material properties. These simple and intuitive class of descriptors are generated from the energetics of a material at a low level of theory using an incomplete ab initio calculation. We demonstrate how the incorporation of ROSA descriptors in ML-based property prediction leads to accurate predictions over a wide range of crystals, amorphized crystals, metal-organic frameworks and molecules. We believe that the low computational cost and ease of use of these descriptors will significantly improve ML-based predictions.
△ Less
Submitted 2 October, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Ensembling of Distilled Models from Multi-task Teachers for Constrained Resource Language Pairs
Authors:
Amr Hendy,
Esraa A. Gad,
Mohamed Abdelghaffar,
Jailan S. ElMosalami,
Mohamed Afify,
Ahmed Y. Tawfik,
Hany Hassan Awadalla
Abstract:
This paper describes our submission to the constrained track of WMT21 shared news translation task. We focus on the three relatively low resource language pairs Bengali to and from Hindi, English to and from Hausa, and Xhosa to and from Zulu. To overcome the limitation of relatively low parallel data we train a multilingual model using a multitask objective employing both parallel and monolingual…
▽ More
This paper describes our submission to the constrained track of WMT21 shared news translation task. We focus on the three relatively low resource language pairs Bengali to and from Hindi, English to and from Hausa, and Xhosa to and from Zulu. To overcome the limitation of relatively low parallel data we train a multilingual model using a multitask objective employing both parallel and monolingual data. In addition, we augment the data using back translation. We also train a bilingual model incorporating back translation and knowledge distillation then combine the two models using sequence-to-sequence map**. We see around 70% relative gain in BLEU point for English to and from Hausa, and around 25% relative improvements for both Bengali to and from Hindi, and Xhosa to and from Zulu compared to bilingual baselines.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
Score Combination for Improved Parallel Corpus Filtering for Low Resource Conditions
Authors:
Muhammad N. ElNokrashy,
Amr Hendy,
Mohamed Abdelghaffar,
Mohamed Afify,
Ahmed Tawfik,
Hany Hassan Awadalla
Abstract:
This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvemen…
▽ More
This paper describes our submission to the WMT20 sentence filtering task. We combine scores from (1) a custom LASER built for each source language, (2) a classifier built to distinguish positive and negative pairs by semantic alignment, and (3) the original scores included in the task devkit. For the mBART finetuning setup, provided by the organizers, our method shows 7% and 5% relative improvement over baseline, in sacreBLEU score on the test set for Pashto and Khmer respectively.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
Gender Aware Spoken Language Translation Applied to English-Arabic
Authors:
Mostafa Elaraby,
Ahmed Y. Tawfik,
Mahmoud Khaled,
Hany Hassan,
Aly Osama
Abstract:
Spoken Language Translation (SLT) is becoming more widely used and becoming a communication tool that helps in crossing language barriers. One of the challenges of SLT is the translation from a language without gender agreement to a language with gender agreement such as English to Arabic. In this paper, we introduce an approach to tackle such limitation by enabling a Neural Machine Translation sy…
▽ More
Spoken Language Translation (SLT) is becoming more widely used and becoming a communication tool that helps in crossing language barriers. One of the challenges of SLT is the translation from a language without gender agreement to a language with gender agreement such as English to Arabic. In this paper, we introduce an approach to tackle such limitation by enabling a Neural Machine Translation system to produce gender-aware translation. We show that NMT system can model the speaker/listener gender information to produce gender-aware translation. We propose a method to generate data used in adapting a NMT system to produce gender-aware. The proposed approach can achieve significant improvement of the translation quality by 2 BLEU points.
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
Synthetic Data for Neural Machine Translation of Spoken-Dialects
Authors:
Hany Hassan,
Mostafa Elaraby,
Ahmed Tawfik
Abstract:
In this paper, we introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. Our approach is language independent and can be used to generate data for any variant of the s…
▽ More
In this paper, we introduce a novel approach to generate synthetic data for training Neural Machine Translation systems. The proposed approach transforms a given parallel corpus between a written language and a target language to a parallel corpus between a spoken dialect variant and the target language. Our approach is language independent and can be used to generate data for any variant of the source language such as slang or spoken dialect or even for a different language that is closely related to the source language.
The proposed approach is based on local embedding projection of distributed representations which utilizes monolingual embeddings to transform parallel data across language variants. We report experimental results on Levantine to English translation using Neural Machine Translation. We show that the generated data can improve a very large scale system by more than 2.8 Bleu points using synthetic spoken data which shows that it can be used to provide a reliable translation system for a spoken dialect that does not have sufficient parallel data.
△ Less
Submitted 28 November, 2017; v1 submitted 30 June, 2017;
originally announced July 2017.
-
Calibrated Fair Measures of Measure: Indices to Quantify an Individual's Scientific Research Output
Authors:
A. Tawfik
Abstract:
Are existing ways of measuring scientific quality reflecting disadvantages of not being part of giant collaborations? How could possible discrimination be avoided? We propose indices defined for each discipline (subfield) and which count the plausible contributions added up by collaborators maintaining the spirit of interdependency. Based on the growing debate about defining potential biases and d…
▽ More
Are existing ways of measuring scientific quality reflecting disadvantages of not being part of giant collaborations? How could possible discrimination be avoided? We propose indices defined for each discipline (subfield) and which count the plausible contributions added up by collaborators maintaining the spirit of interdependency. Based on the growing debate about defining potential biases and detecting unethical behavior, a standardized method to measure contributions of the astronomical number of coauthors is introduced.
△ Less
Submitted 31 May, 2013;
originally announced June 2013.