Search | arXiv e-print repository

Improvements & Evaluations on the MLCommons CloudMask Benchmark

Authors: Varshitha Chennamsetti, Laiba Mehnaz, Dan Zhao, Banani Ghosh, Sergey V. Samsonau

Abstract: In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (NYU): NYU Greene. MLCommons is a consortium that develops and maintains several scientific benchmarks that can benefit from developments in AI. We provide a description of the cloud-masking benchmark t… ▽ More In this paper, we report the performance benchmarking results of deep learning models on MLCommons' Science cloud-masking benchmark using a high-performance computing cluster at New York University (NYU): NYU Greene. MLCommons is a consortium that develops and maintains several scientific benchmarks that can benefit from developments in AI. We provide a description of the cloud-masking benchmark task, updated code, and the best model for this benchmark when using our selected hyperparameter settings. Our benchmarking results include the highest accuracy achieved on the NYU system as well as the average time taken for both training and inference on the benchmark across several runs/seeds. Our code can be found on GitHub. MLCommons team has been kept informed about our progress and may use the developed code for their future work. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.08636

arXiv:2401.08636

MLCommons Cloud Masking Benchmark with Early Stop**

Authors: Varshitha Chennamsetti, Gregor von Laszewski, Ruochen Gu, Laiba Mehnaz, Juri Papay, Samuel Jackson, Jeyan Thiyagalingam, Sergey V. Samsonau, Geoffrey C. Fox

Abstract: In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop.… ▽ More In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop. We provide a description of the cloud masking benchmark, as well as a summary of our submission to MLCommons on the benchmark experiment we conducted. It includes a modification to the reference implementation of the cloud masking benchmark enabling early stop**. This benchmark is executed on the NYU HPC through a custom batch script that runs the various experiments through the batch queuing system while allowing for variation on the number of epochs trained. Our submission includes the modified code, a custom batch script to modify epochs, documentation, and the benchmark results. We report the highest accuracy (scientific metric) and the average time taken (performance metric) for training and inference that was achieved on NYU HPC Greene. We also provide a comparison of the compute capabilities between different systems by running the benchmark for one epoch. Our submission can be found in a Globus repository that is accessible to MLCommons Science Working Group. △ Less

Submitted 30 May, 2024; v1 submitted 11 December, 2023; originally announced January 2024.

Comments: NYU did not approve the publication of the paper

arXiv:2210.08966 [pdf, other]

Scalable authentic research education framework

Authors: Sergey V Samsonau, Aziza Kurbonova, Lu Jiang, Hazem Lashen, Jiamu Bai, Theresa Merchant, Ruoxi Wang, Laiba Mehnaz, Zecheng Wang, Ishita Patil

Abstract: We report a framework that enables the broad adoption of authentic research educational methodology at various schools. We list and address common barriers that appear in many existing authentic research education programs. In our program, teams of students with complementary skills develop useful artificial intelligence (AI) solutions for researchers in natural sciences. To accomplish this, we wo… ▽ More We report a framework that enables the broad adoption of authentic research educational methodology at various schools. We list and address common barriers that appear in many existing authentic research education programs. In our program, teams of students with complementary skills develop useful artificial intelligence (AI) solutions for researchers in natural sciences. To accomplish this, we work with research laboratories that reveal/specify their needs, and then our student teams work on the discovery, design, and development of an AI solution for unique problems using a consulting-like arrangement. To date, our group has been operating at New York University (NYU) for six consecutive semesters, has engaged more than eighty students, ranging from first-year college students to master's candidates, and has worked with around twenty projects and collaborators. While creating education benefits for students, our approach also directly benefits scientists, who get an opportunity to evaluate the usefulness of machine learning for their specific needs. △ Less

Submitted 29 December, 2023; v1 submitted 19 September, 2022; originally announced October 2022.

arXiv:2104.08578 [pdf, other]

GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations

Authors: Laiba Mehnaz, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle Lee, Anish Acharya, Rajiv Ratn Shah

Abstract: Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these convers… ▽ More Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these conversations. Towards this objective, we introduce abstractive summarization of Hindi-English code-switched conversations and develop the first code-switched conversation summarization dataset - GupShup, which contains over 6,831 conversations in Hindi-English and their corresponding human-annotated summaries in English and Hindi-English. We present a detailed account of the entire data collection and annotation processes. We analyze the dataset using various code-switching statistics. We train state-of-the-art abstractive summarization models and report their performances using both automated metrics and human evaluation. Our results show that multi-lingual mBART and multi-view seq2seq models obtain the best performances on the new dataset △ Less

Submitted 17 April, 2021; originally announced April 2021.

arXiv:1904.09076 [pdf, other]

Suggestion Mining from Online Reviews using ULMFiT

Authors: Sarthak Anand, Debanjan Mahata, Kartik Aggarwal, Laiba Mehnaz, Simra Shahid, Haimin Zhang, Yaman Kumar, Rajiv Ratn Shah, Karan Uppal

Abstract: In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. Given a sentence, the task asks to predict whether the sentence consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the la… ▽ More In this paper we present our approach and the system description for Sub Task A of SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. Given a sentence, the task asks to predict whether the sentence consists of a suggestion or not. Our model is based on Universal Language Model Fine-tuning for Text Classification. We apply various pre-processing techniques before training the language and the classification model. We further provide detailed analysis of the results obtained using the trained model. Our team ranked 10th out of 34 participants, achieving an F1 score of 0.7011. We publicly share our implementation at https://github.com/isarth/SemEval9_MIDAS △ Less

Submitted 19 April, 2019; originally announced April 2019.

arXiv:1904.09072 [pdf, other]

Identifying Offensive Posts and Targeted Offense from Twitter

Authors: Haimin Zhang, Debanjan Mahata, Simra Shahid, Laiba Mehnaz, Sarthak Anand, Yaman Singla, Rajiv Ratn Shah, Karan Uppal

Abstract: In this paper we present our approach and the system description for Sub-task A and Sub Task B of SemEval 2019 Task 6: Identifying and Categorizing Offensive Language in Social Media. Sub-task A involves identifying if a given tweet is offensive or not, and Sub Task B involves detecting if an offensive tweet is targeted towards someone (group or an individual). Our models for Sub-task A is based o… ▽ More In this paper we present our approach and the system description for Sub-task A and Sub Task B of SemEval 2019 Task 6: Identifying and Categorizing Offensive Language in Social Media. Sub-task A involves identifying if a given tweet is offensive or not, and Sub Task B involves detecting if an offensive tweet is targeted towards someone (group or an individual). Our models for Sub-task A is based on an ensemble of Convolutional Neural Network, Bidirectional LSTM with attention, and Bidirectional LSTM + Bidirectional GRU, whereas for Sub-task B, we rely on a set of heuristics derived from the training data and manual observation. We provide detailed analysis of the results obtained using the trained models. Our team ranked 5th out of 103 participants in Sub-task A, achieving a macro F1 score of 0.807, and ranked 8th out of 75 participants in Sub Task B achieving a macro F1 of 0.695. △ Less

Submitted 19 April, 2019; originally announced April 2019.

Showing 1–6 of 6 results for author: Mehnaz, L