Search | arXiv e-print repository

Puddles: Application-Independent Recovery and Location-Independent Data for Persistent Memory

Authors: Suyash Mahar, Mingyao Shen, TJ Smith, Joseph Izraelevitz, Steven Swanson

Abstract: In this paper, we argue that current work has failed to provide a comprehensive and maintainable in-memory representation for persistent memory. PM data should be easily mappable into a process address space, shareable across processes, shippable between machines, consistent after a crash, and accessible to legacy code with fast, efficient pointers as first-class abstractions. While existing s… ▽ More In this paper, we argue that current work has failed to provide a comprehensive and maintainable in-memory representation for persistent memory. PM data should be easily mappable into a process address space, shareable across processes, shippable between machines, consistent after a crash, and accessible to legacy code with fast, efficient pointers as first-class abstractions. While existing systems have provided niceties like mmap()-based load/store access, they have not been able to support all these necessary properties due to conflicting requirements. We propose Puddles, a new persistent memory abstraction, to solve these problems. Puddles provide application-independent recovery after a power outage; they make recovery from a system failure a system-level property of the stored data rather than the responsibility of the programs that access it. Puddles use native pointers, so they are compatible with existing code. Finally, Puddles implement support for sharing and ship** of PM data between processes and systems without expensive serialization and deserialization. Compared to existing systems, Puddles are at least as fast as and up to 1.34$\times$ faster than PMDK while being competitive with other PM libraries across YCSB workloads. Moreover, to demonstrate Puddles' ability to relocate data, we showcase a sensor network data-aggregation workload that results in a 4.7$\times$ speedup over PMDK. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: To appear in EuroSys 2024

arXiv:2310.00836 [pdf, other]

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

Authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral

Abstract: Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge R… ▽ More Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there's a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. Utilizing LogiGLUE as a foundation, we have trained an instruction fine-tuned language model, resulting in LogiT5. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model across the different logical reasoning categories. We also assess various LLMs using LogiGLUE, and the findings indicate that LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field. △ Less

Submitted 30 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Work in progress

arXiv:2310.00205 [pdf, other]

An Empirical Study on the Use of Static Analysis Tools in Open Source Embedded Software

Authors: Mingjie Shen, Akul Pillai, Brian A. Yuan, James C. Davis, Aravind Machiry

Abstract: This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the c… ▽ More This paper performs the first study to understand the prevalence, challenges, and effectiveness of using Static Application Security Testing (SAST) tools on Open-Source Embedded Software (EMBOSS) repositories. We collect a corpus of 258 of the most popular EMBOSS projects, representing 13 distinct categories such as real-time operating systems, network stacks, and applications. To understand the current use of SAST tools on EMBOSS, we measured this corpus and surveyed developers. To understand the challenges and effectiveness of using SAST tools on EMBOSS projects, we applied these tools to the projects in our corpus. We report that almost none of these projects (just 3%) use SAST tools beyond those baked into the compiler, and developers give rationales such as ineffectiveness and false positives. In applying SAST tools ourselves, we show that minimal engineering effort and project expertise are needed to apply many tools to a given EMBOSS project. GitHub's CodeQL was the most effective SAST tool -- using its built-in security checks we found a total of 540 defects (with a false positive rate of 23%) across the 258 projects, with 399 (74%) likely security vulnerabilities, including in projects maintained by Microsoft, Amazon, and the Apache Foundation. EMBOSS engineers have confirmed 273 (51%) of these defects, mainly by accepting our pull requests. Two CVEs were issued. In summary, we urge EMBOSS engineers to adopt the current generation of SAST tools, which offer low false positive rates and are effective at finding security-relevant defects. △ Less

Submitted 29 September, 2023; originally announced October 2023.

arXiv:2309.11001 [pdf, other]

doi 10.1145/3613424.3614279

GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

Authors: Kaustubh Shivdikar, Yuhui Bao, Rashmi Agrawal, Michael Shen, Gilbert Jonatan, Evelio Mora, Alexander Ingare, Neal Livesay, José L. Abellán, John Kim, Ajay Joshi, David Kaeli

Abstract: Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computatio… ▽ More Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE. In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined $64$-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by $19\%$. Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of $796\times$, $14.2\times$, and $2.3\times$ over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.09972 [pdf, other]

Artificial Intelligence for Web 3.0: A Comprehensive Survey

Authors: Meng Shen, Zhehui Tan, Dusit Niyato, Yuzhi Liu, Jiawen Kang, Zehui Xiong, Liehuang Zhu, Wei Wang, Xuemin, Shen

Abstract: Web 3.0 is the new generation of the Internet that is reconstructed with distributed technology, which focuses on data ownership and value expression. Also, it operates under the principle that data and digital assets should be owned and controlled by users rather than large corporations. In this survey, we explore the current development state of Web 3.0 and the application of AI Technology in We… ▽ More Web 3.0 is the new generation of the Internet that is reconstructed with distributed technology, which focuses on data ownership and value expression. Also, it operates under the principle that data and digital assets should be owned and controlled by users rather than large corporations. In this survey, we explore the current development state of Web 3.0 and the application of AI Technology in Web 3.0. Through investigating the existing applications and components of Web 3.0, we propose an architectural framework for Web 3.0 from the perspective of ecological application scenarios. We outline and divide the ecology of Web 3.0 into four layers. The main functions of each layer are data management, value circulation, ecological governance, and application scenarios. Our investigation delves into the major challenges and issues present in each of these layers. In this context, AI has shown its strong potential to solve existing problems of Web 3.0. We illustrate the crucial role of AI in the foundation and growth of Web 3.0. We begin by providing an overview of AI, including machine learning algorithms and deep learning techniques. Then, we thoroughly analyze the current state of AI technology applications in the four layers of Web 3.0 and offer some insights into its potential future development direction. △ Less

Submitted 17 August, 2023; originally announced September 2023.

arXiv:2309.03284 [pdf, other]

Photonic link from single flux quantum circuits to room temperature

Authors: Mohan Shen, Jiacheng Xie, Yuntao Xu, Sihao Wang, Risheng Cheng, Wei Fu, Yiyu Zhou, Hong X. Tang

Abstract: Broadband, energy-efficient signal transfer between cryogenic and room-temperature environment has been a major bottleneck for superconducting quantum and classical logic circuits. Photonic links promise to overcome this challenge by offering simultaneous high bandwidth and low thermal load. However, the development of cryogenic electro-optic modulators -- a key component for photonic readout of e… ▽ More Broadband, energy-efficient signal transfer between cryogenic and room-temperature environment has been a major bottleneck for superconducting quantum and classical logic circuits. Photonic links promise to overcome this challenge by offering simultaneous high bandwidth and low thermal load. However, the development of cryogenic electro-optic modulators -- a key component for photonic readout of electrical signals -- has been stifled by the stringent requirements of superconducting circuits. Rapid single flux quantum circuits (RSFQ), for example, operate with a tiny signal amplitude of only a few millivolts (mV), far below the volt-level signal used in conventional circuits. Here, we demonstrate the first direct optical readout of an RSFQ circuit without additional electrical amplification enabled by a novel superconducting electro-optic modulator (SEOM) featuring a record-low half-wave voltage Vπ of 42 mV on a 1 m-long SEOM. Leveraging the low ohmic loss of superconductors, we break the fundamental Vπ-bandwidth trade-off and demonstrate electro-optic bandwidth up to 17 GHz on a 0.2 m-long SEOM at cryogenic temperatures. Our work presents a viable solution toward high-bandwidth signal transfer between future large-scale superconducting circuits and room-temperature electronics. △ Less

Submitted 25 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-** Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2308.16785 [pdf]

Agent Teaming Situation Awareness (ATSA): A Situation Awareness Framework for Human-AI Teaming

Authors: Qi Gao, Wei Xu, Mowei Shen, Zaifeng Gao

Abstract: The rapid advancements in artificial intelligence (AI) have led to a growing trend of human-AI teaming (HAT) in various fields. As machines continue to evolve from mere automation to a state of autonomy, they are increasingly exhibiting unexpected behaviors and human-like cognitive/intelligent capabilities, including situation awareness (SA). This shift has the potential to enhance the performance… ▽ More The rapid advancements in artificial intelligence (AI) have led to a growing trend of human-AI teaming (HAT) in various fields. As machines continue to evolve from mere automation to a state of autonomy, they are increasingly exhibiting unexpected behaviors and human-like cognitive/intelligent capabilities, including situation awareness (SA). This shift has the potential to enhance the performance of mixed human-AI teams over all-human teams, underscoring the need for a better understanding of the dynamic SA interactions between humans and machines. To this end, we provide a review of leading SA theoretical models and a new framework for SA in the HAT context based on the key features and processes of HAT. The Agent Teaming Situation Awareness (ATSA) framework unifies human and AI behavior, and involves bidirectional, and dynamic interaction. The framework is based on the individual and team SA models and elaborates on the cognitive mechanisms for modeling HAT. Similar perceptual cycles are adopted for the individual (including both human and AI) and the whole team, which is tailored to the unique requirements of the HAT context. ATSA emphasizes cohesive and effective HAT through structures and components, including teaming understanding, teaming control, and the world, as well as adhesive transactive part. We further propose several future research directions to expand on the distinctive contributions of ATSA and address the specific and pressing next steps. △ Less

Submitted 4 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: 52 pages,5 figures, 1 table

arXiv:2308.10728 [pdf, other]

doi 10.1093/rasti/rzad034

Synergies between interstellar dust and heliospheric science with an Interstellar Probe

Authors: Veerle J. Sterken, Silvan Hunziker, Kostas Dialynas, Jan Leitner, Maximilian Sommer, Ralf Srama, Lennart R. Baalmann, Aigen Li, Konstantin Herbst, André Galli, Pontus Brandt, My Riebe, Jack Baggaley, Michel Blanc, Andrej Czechowski, Frederic Effenberger, Brian Fields, Priscilla Frisch, Mihaly Horanyi, Hsiang-Wen Hsu, Nozair Khawaja, Harald Krüger, Bill S. Kurth, Niels F. W. Ligterink, Jeffrey L. Linsky , et al. (18 additional authors not shown)

Abstract: We discuss the synergies between heliospheric and dust science, the open science questions, the technological endeavors and programmatic aspects that are important to maintain or develop in the decade to come. In particular, we illustrate how we can use interstellar dust in the solar system as a tracer for the (dynamic) heliosphere properties, and emphasize the fairly unexplored, but potentially i… ▽ More We discuss the synergies between heliospheric and dust science, the open science questions, the technological endeavors and programmatic aspects that are important to maintain or develop in the decade to come. In particular, we illustrate how we can use interstellar dust in the solar system as a tracer for the (dynamic) heliosphere properties, and emphasize the fairly unexplored, but potentially important science question of the role of cosmic dust in heliospheric and astrospheric physics. We show that an Interstellar Probe mission with a dedicated dust suite would bring unprecedented advances to interstellar dust research, and can also contribute-through measuring dust - to heliospheric science. This can, in particular, be done well if we work in synergy with other missions inside the solar system, thereby using multiple vantage points in space to measure the dust as it `rolls' into the heliosphere. Such synergies between missions inside the solar system and far out are crucial for disentangling the spatially and temporally varying dust flow. Finally, we highlight the relevant instrumentation and its suitability for contributing to finding answers to the research questions. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: 18 pages, 7 Figures, 5 Tables. Originally submitted as white paper for the National Academies Decadal Survey for Solar and Space Physics 2024-2033

Journal ref: RAS Techniques and Instruments, rzad034 (2023)

arXiv:2308.08025 [pdf, other]

Potential Energy Advantage of Quantum Economy

Authors: Junyu Liu, Hansheng Jiang, Zuo-Jun Max Shen

Abstract: Energy cost is increasingly crucial in the modern computing industry with the wide deployment of large-scale machine learning models and language models. For the firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. In this paper, we study the energy benefits of quantum computing vis-a-vis c… ▽ More Energy cost is increasingly crucial in the modern computing industry with the wide deployment of large-scale machine learning models and language models. For the firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. In this paper, we study the energy benefits of quantum computing vis-a-vis classical computing. Deviating from the conventional notion of quantum advantage based solely on computational complexity, we redefine advantage in an energy efficiency context. Through a Cournot competition model constrained by energy usage, we demonstrate quantum computing firms can outperform classical counterparts in both profitability and energy efficiency at Nash equilibrium. Therefore quantum computing may represent a more sustainable pathway for the computing industry. Moreover, we discover that the energy benefits of quantum computing economies are contingent on large-scale computation. Based on real physical parameters, we further illustrate the scale of operation necessary for realizing this energy efficiency advantage. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 23 pages, many figures

arXiv:2308.06717 [pdf, other]

Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden Rewards

Authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani

Abstract: In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the… ▽ More In practice, incentive providers (i.e., principals) often cannot observe the reward realizations of incentivized agents, which is in contrast to many principal-agent models that have been previously studied. This information asymmetry challenges the principal to consistently estimate the agent's unknown rewards by solely watching the agent's decisions, which becomes even more challenging when the agent has to learn its own rewards. This complex setting is observed in various real-life scenarios ranging from renewable energy storage contracts to personalized healthcare incentives. Hence, it offers not only interesting theoretical questions but also wide practical relevance. This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal. The agent tackles a multi-armed bandit (MAB) problem to maximize their expected reward plus incentive. On top of the agent's learning, the principal trains a parallel algorithm and faces a trade-off between consistently estimating the agent's unknown rewards and maximizing their own utility by offering adaptive incentives to lead the agent. For a non-parametric model, we introduce an estimator whose only input is the history of principal's incentives and agent's choices. We unite this estimator with a proposed data-driven incentive policy within a MAB framework. Without restricting the type of the agent's algorithm, we prove finite-sample consistency of the estimator and a rigorous regret bound for the principal by considering the sequential externality imposed by the agent. Lastly, our theoretical results are reinforced by simulations justifying applicability of our framework to green energy aggregator contracts. △ Less

Submitted 13 August, 2023; originally announced August 2023.

Comments: 72 pages, 6 figures. arXiv admin note: text overlap with arXiv:2304.07407

arXiv:2307.13339 [pdf, other]

Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

Authors: Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju

Abstract: Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsibl… ▽ More Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsible model deployment. We address this question by leveraging gradient-based feature attribution methods which produce saliency scores that capture the influence of input tokens on model output. Specifically, we probe several open-source LLMs to investigate whether CoT prompting affects the relative importances they assign to particular input tokens. Our results indicate that while CoT prompting does not increase the magnitude of saliency scores attributed to semantically relevant tokens in the prompt compared to standard few-shot prompting, it increases the robustness of saliency scores to question perturbations and variations in model output. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: Accepted to Workshop on Challenges in Deployable Generative AI at ICML 2023

arXiv:2307.09143 [pdf, other]

doi 10.23919/MVA57639.2023.10215935

MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

Authors: Yuki Kondo, Norimichi Ukita, Takayuki Yamaguchi, Hao-Yu Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yu-Cheng Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, Syusuke Yasui

Abstract: Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the S… ▽ More Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: This paper is included in the proceedings of the 18th International Conference on Machine Vision Applications (MVA2023). It will be officially published at a later date. Project page : https://www.mva-org.jp/mva2023/challenge

Journal ref: 2023 18th International Conference on Machine Vision and Applications (MVA)

arXiv:2307.09002 [pdf, other]

CBSeq: A Channel-level Behavior Sequence For Encrypted Malware Traffic Detection

Authors: Susu Cui, Cong Dong, Meng Shen, Yuling Liu, Bo Jiang, Zhigang Lu

Abstract: Machine learning and neural networks have become increasingly popular solutions for encrypted malware traffic detection. They mine and learn complex traffic patterns, enabling detection by fitting boundaries between malware traffic and benign traffic. Compared with signature-based methods, they have higher scalability and flexibility. However, affected by the frequent variants and updates of malwa… ▽ More Machine learning and neural networks have become increasingly popular solutions for encrypted malware traffic detection. They mine and learn complex traffic patterns, enabling detection by fitting boundaries between malware traffic and benign traffic. Compared with signature-based methods, they have higher scalability and flexibility. However, affected by the frequent variants and updates of malware, current methods suffer from a high false positive rate and do not work well for unknown malware traffic detection. It remains a critical task to achieve effective malware traffic detection. In this paper, we introduce CBSeq to address the above problems. CBSeq is a method that constructs a stable traffic representation, behavior sequence, to characterize attacking intent and achieve malware traffic detection. We novelly propose the channels with similar behavior as the detection object and extract side-channel content to construct behavior sequence. Unlike benign activities, the behavior sequences of malware and its variant's traffic exhibit solid internal correlations. Moreover, we design the MSFormer, a powerful Transformer-based multi-sequence fusion classifier. It captures the internal similarity of behavior sequence, thereby distinguishing malware traffic from benign traffic. Our evaluations demonstrate that CBSeq performs effectively in various known malware traffic detection and exhibits superior performance in unknown malware traffic detection, outperforming state-of-the-art methods. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: Submitted to IEEE TIFS

arXiv:2307.06486 [pdf, ps, other]

doi 10.1038/s41563-024-01797-0

Absence of $3a_0$ Charge Density Wave Order in the Infinite Layer Nickelates

Authors: C. T. Parzyck, N. K. Gupta, Y. Wu, V. Anil, L. Bhatt, M. Bouliane, R. Gong, B. Z. Gregory, A. Luo, R. Sutarto, F. He, Y. -D. Chuang, T. Zhou, G. Herranz, L. F. Kourkoutis, A. Singer, D. G. Schlom, D. G. Hawthorn, K. M. Shen

Abstract: A hallmark of many unconventional superconductors is the presence of many-body interactions which give rise to broken symmetry states intertwined with superconductivity. Recent resonant soft x-ray scattering experiments report commensurate $3a_0$ charge density wave order in the infinite layer nickelates, which has important implications regarding the universal interplay between charge order and s… ▽ More A hallmark of many unconventional superconductors is the presence of many-body interactions which give rise to broken symmetry states intertwined with superconductivity. Recent resonant soft x-ray scattering experiments report commensurate $3a_0$ charge density wave order in the infinite layer nickelates, which has important implications regarding the universal interplay between charge order and superconductivity in both the cuprates and nickelates. Here, we present x-ray scattering and spectroscopy measurements on a series of NdNiO$_{2+x}$ samples which reveal that the signatures of charge density wave order are absent in fully reduced, single-phase NdNiO$_2$. The $3a_0$ superlattice peak instead originates from a partially reduced impurity phase where excess apical oxygens form ordered rows with 3 unit cell periodicity. The absence of any observable charge density wave order in NdNiO$_2$ highlights a crucial difference between the phase diagrams of the cuprate and nickelate superconductors. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: Main Text: 8 pages, 4 figures. Supplemental: 12 pages, 12 figures

arXiv:2307.00340 [pdf, ps, other]

doi 10.1016/j.aop.2023.169405

On relation between renormalized frequency and heat capacity for particles in an anharmonic potential

Authors: Y. T. Liu, Y. H. Zhao, Y. Zhong, J. M. Shen, J. H. Zhang, Q. H. Liu

Abstract: For free particles in a simple harmonic potential plus a weak anharmonicity, characterized by a set of anharmonic parameters, Newtonian mechanics asserts that there is a renormalization of the natural frequency of the periodic motion; and statistical mechanics claims that the anharmonicity causes a correction to the heat capacity of an ideal gas in the anharmonic potential. The orbital motion and… ▽ More For free particles in a simple harmonic potential plus a weak anharmonicity, characterized by a set of anharmonic parameters, Newtonian mechanics asserts that there is a renormalization of the natural frequency of the periodic motion; and statistical mechanics claims that the anharmonicity causes a correction to the heat capacity of an ideal gas in the anharmonic potential. The orbital motion and thermal motion depend on the same anharmonic parameters, but in different combinations. These two manners of combinations are fundamentally different, demonstrating that statistical law can not emerge from the many-body limit of deterministic law for one-body. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 9 pages, no figure. arXiv admin note: text overlap with arXiv:2210.00906

arXiv:2306.17020

Classifying Crime Types using Judgment Documents from Social Media

Authors: Haoxuan Xu, Zeyu He, Mengfan Shen, Songning Lai, Ziqiang Han, Yifan Peng

Abstract: The task of determining crime types based on criminal behavior facts has become a very important and meaningful task in social science. But the problem facing the field now is that the data samples themselves are unevenly distributed, due to the nature of the crime itself. At the same time, data sets in the judicial field are less publicly available, and it is not practical to produce large data s… ▽ More The task of determining crime types based on criminal behavior facts has become a very important and meaningful task in social science. But the problem facing the field now is that the data samples themselves are unevenly distributed, due to the nature of the crime itself. At the same time, data sets in the judicial field are less publicly available, and it is not practical to produce large data sets for direct training. This article proposes a new training model to solve this problem through NLP processing methods. We first propose a Crime Fact Data Preprocessing Module (CFDPM), which can balance the defects of uneven data set distribution by generating new samples. Then we use a large open source dataset (CAIL-big) as our pretraining dataset and a small dataset collected by ourselves for Fine-tuning, giving it good generalization ability to unfamiliar small datasets. At the same time, we use the improved Bert model with dynamic masking to improve the model. Experiments show that the proposed method achieves state-of-the-art results on the present dataset. At the same time, the effectiveness of module CFDPM is proved by experiments. This article provides a valuable methodology contribution for classifying social science texts such as criminal behaviors. Extensive experiments on public benchmarks show that the proposed method achieves new state-of-the-art results. △ Less

Submitted 21 October, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: The paper has no errors; it just needs to be supplemented to become a new article

arXiv:2306.15530 [pdf, other]

Fast and Automatic 3D Modeling of Antenna Structure Using CNN-LSTM Network for Efficient Data Generation

Authors: Zhaohui Wei, Zhao Zhou, Peng Wang, Jian Ren, Yingzeng Yin, Gert Frølund Pedersen, Ming Shen

Abstract: Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering pro… ▽ More Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering properties, etc.) is mandatory for these methods to ensure good performance. Traditional antenna modeling methods relying on manual model construction and modification are time-consuming and cannot meet the requirement of efficient training data acquisition. In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures. Specifically, our method only needs an image of the antenna structure, usually available in scientific publications, as the input while the corresponding modeling codes (VBA language) are generated automatically. The proposed model mainly consists of two parts: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) networks. The former is used for capturing features of antenna structure images and the latter is employed to generate the modeling codes. Through training, the proposed model can achieve fast and automatic data acquisition of antenna physical structures based on antenna images. Experiment results show that the proposed method achieves a significant speed enhancement than the manual modeling approach. This approach lays the foundation for efficient data acquisition needed to build robust surrogate models in the future. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.14306 [pdf, other]

Adaptive Sharpness-Aware Pruning for Robust Sparse Networks

Authors: Anna Bair, Hongxu Yin, Maying Shen, Pavlo Molchanov, Jose Alvarez

Abstract: Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the… ▽ More Robustness and compactness are two essential attributes of deep learning models that are deployed in the real world. The goals of robustness and compactness may seem to be at odds, since robustness requires generalization across domains, while the process of compression exploits specificity in one domain. We introduce Adaptive Sharpness-Aware Pruning (AdaSAP), which unifies these goals through the lens of network sharpness. The AdaSAP method produces sparse networks that are robust to input variations which are unseen at training time. We achieve this by strategically incorporating weight perturbations in order to optimize the loss landscape. This allows the model to be both primed for pruning and regularized for improved robustness. AdaSAP improves the robust accuracy of pruned models on image classification by up to +6% on ImageNet C and +4% on ImageNet V2, and on object detection by +4% on a corrupted Pascal VOC dataset, over a wide range of compression ratios, pruning criteria, and network architectures, outperforming recent pruning art by large margins. △ Less

Submitted 13 March, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

arXiv:2306.08306 [pdf, other]

doi 10.1145/3581783.3612463

Towards Balanced Active Learning for Multimodal Classification

Authors: Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See

Abstract: Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multi… ▽ More Training multimodal networks requires a vast amount of data due to their larger parameter space compared to unimodal networks. Active learning is a widely used technique for reducing data annotation costs by selecting only those samples that could contribute to improving model performance. However, current active learning strategies are mostly designed for unimodal tasks, and when applied to multimodal data, they often result in biased sample selection from the dominant modality. This unfairness hinders balanced multimodal learning, which is crucial for achieving optimal performance. To address this issue, we propose three guidelines for designing a more balanced multimodal active learning strategy. Following these guidelines, a novel approach is proposed to achieve more fair data selection by modulating the gradient embedding with the dominance degree among modalities. Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality. Our approach outperforms existing active learning strategies on a variety of multimodal classification tasks. Overall, our work highlights the importance of balancing sample selection in multimodal active learning and provides a practical solution for achieving more balanced active learning for multimodal classification. △ Less

Submitted 21 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 12 pages, accepted by ACMMM 2023

arXiv:2306.05734 [pdf, other]

DP-HyPO: An Adaptive Private Hyperparameter Optimization Framework

Authors: Hua Wang, Sheng Gao, Huanyu Zhang, Weijie J. Su, Milan Shen

Abstract: Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to a… ▽ More Hyperparameter optimization, also known as hyperparameter tuning, is a widely recognized technique for improving model performance. Regrettably, when training private ML models, many practitioners often overlook the privacy risks associated with hyperparameter optimization, which could potentially expose sensitive information about the underlying dataset. Currently, the sole existing approach to allow privacy-preserving hyperparameter optimization is to uniformly and randomly select hyperparameters for a number of runs, subsequently reporting the best-performing hyperparameter. In contrast, in non-private settings, practitioners commonly utilize ``adaptive'' hyperparameter optimization methods such as Gaussian process-based optimization, which select the next candidate based on information gathered from previous outputs. This substantial contrast between private and non-private hyperparameter optimization underscores a critical concern. In our paper, we introduce DP-HyPO, a pioneering framework for ``adaptive'' private hyperparameter optimization, aiming to bridge the gap between private and non-private hyperparameter optimization. To accomplish this, we provide a comprehensive differential privacy analysis of our framework. Furthermore, we empirically demonstrate the effectiveness of DP-HyPO on a diverse set of real-world datasets. △ Less

Submitted 26 November, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2306.05275 [pdf, ps, other]

Federated Linear Contextual Bandits with User-level Differential Privacy

Authors: Ruiquan Huang, Huanyu Zhang, Luca Melis, Milan Shen, Meisam Hajzinia, **g Yang

Abstract: This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamenta… ▽ More This paper studies federated linear contextual bandits under the notion of user-level differential privacy (DP). We first introduce a unified federated bandits framework that can accommodate various definitions of DP in the sequential decision-making setting. We then formally introduce user-level central DP (CDP) and local DP (LDP) in the federated bandits framework, and investigate the fundamental trade-offs between the learning regrets and the corresponding DP guarantees in a federated linear contextual bandits model. For CDP, we propose a federated algorithm termed as $\texttt{ROBIN}$ and show that it is near-optimal in terms of the number of clients $M$ and the privacy budget $\varepsilon$ by deriving nearly-matching upper and lower regret bounds when user-level DP is satisfied. For LDP, we obtain several lower bounds, indicating that learning under user-level $(\varepsilon,δ)$-LDP must suffer a regret blow-up factor at least $\min\{1/\varepsilon,M\}$ or $\min\{1/\sqrt{\varepsilon},\sqrt{M}\}$ under different conditions. △ Less

Submitted 9 June, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by ICML 2023

arXiv:2306.05108 [pdf, other]

Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs

Authors: Zehui Li, Xiangyu Zhao, Mingzhu Shen, Guy-Bart Stan, Pietro Liò, Yiren Zhao

Abstract: Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been pro… ▽ More Graphs are widely used to encapsulate a variety of data formats, but real-world networks often involve complex node relations beyond only being pairwise. While hypergraphs and hierarchical graphs have been developed and employed to account for the complex node relations, they cannot fully represent these complexities in practice. Additionally, though many Graph Neural Networks (GNNs) have been proposed for representation learning on higher-order graphs, they are usually only evaluated on simple graph datasets. Therefore, there is a need for a unified modelling of higher-order graphs, and a collection of comprehensive datasets with an accessible evaluation framework to fully understand the performance of these algorithms on complex graphs. In this paper, we introduce the concept of hybrid graphs, a unified definition for higher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains 23 real-world hybrid graph datasets across various domains such as biology, social media, and e-commerce. Furthermore, we provide an extensible evaluation framework and a supporting codebase to facilitate the training and evaluation of GNNs on HGB. Our empirical study of existing GNNs on HGB reveals various research opportunities and gaps, including (1) evaluating the actual performance improvement of hypergraph GNNs over simple graph GNNs; (2) comparing the impact of different sampling strategies on hybrid graph learning methods; and (3) exploring ways to integrate simple graph and hypergraph information. We make our source code and full datasets publicly available at https://zehui127.github.io/hybrid-graph-benchmark/. △ Less

Submitted 20 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 16 pages, 5 figures, 11 tables

arXiv:2306.04997 [pdf, other]

Blockage Prediction in Directional mmWave Links Using Liquid Time Constant Network

Authors: Martin H. Nielsen, Chia-Yi Yeh, Ming Shen, Muriel Médard

Abstract: We propose to use a liquid time constant (LTC) network to predict the future blockage status of a millimeter wave (mmWave) link using only the received signal power as the input to the system. The LTC network is based on an ordinary differential equation (ODE) system inspired by biology and specialized for near-future prediction for time sequence observation as the input. Using an experimental dat… ▽ More We propose to use a liquid time constant (LTC) network to predict the future blockage status of a millimeter wave (mmWave) link using only the received signal power as the input to the system. The LTC network is based on an ordinary differential equation (ODE) system inspired by biology and specialized for near-future prediction for time sequence observation as the input. Using an experimental dataset at 60 GHz, we show that our proposed use of LTC can reliably predict the occurrence of blockage and the length of the blockage without the need for scenario-specific data. The results show that the proposed LTC can predict with upwards of 97.85\% accuracy without prior knowledge of the outdoor scenario or retraining/tuning. These results highlight the promising gains of using LTC networks to predict time series-dependent signals, which can lead to more reliable and low-latency communication. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 2 pages, pre-print for IRMMW 2023 conference

arXiv:2306.04360 [pdf, other]

doi 10.1109/TAP.2022.3179898

Robust and Efficient Fault Diagnosis of mm-Wave Active Phased Arrays using Baseband Signal

Authors: Martin H. Nielsen, Yufeng Zhang, Changbin Xue, Jian Ren, Yingzeng Yin, Ming Shen, Gert F. Pedersen

Abstract: One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasib… ▽ More One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasible for on-site deployment. This paper proposes a novel method exploiting a Deep Neural Network (DNN) tailored to extract the features hidden in the baseband in-phase and quadrature signals for classifying the different faults. It requires only a single probe in one measurement point for fast and accurate diagnosis of the faulty elements and components in APAs. Validation of the proposed method is done using a commercial 28 GHz APA. Accuracies of 99% and 80% have been demonstrated for single- and multi-element failure detection, respectively. Three different test scenarios are investigated: on-off antenna elements, phase variations, and magnitude attenuation variations. In a low signal to noise ratio of 4 dB, stable fault detection accuracy above 90% is maintained. This is all achieved with a detection time of milliseconds (e.g 6~ms), showing a high potential for on-site deployment. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages

Journal ref: in IEEE Transactions on Antennas and Propagation, vol. 70, no. 7, pp. 5044-5053, July 2022

arXiv:2305.19553 [pdf, other]

doi 10.1063/5.0166334

Atomically smooth films of CsSb: a chemically robust visible light photocathode

Authors: C. T. Parzyck, C. A. Pennington, W. J. I. DeBenedetti, J. Balajka, E. Echeverria, H. Paik, L. Moreschini, B. D. Faeth, C. Hu, J. K. Nangoi, V. Anil, T. A. Arias, M. A. Hines, D. G. Schlom, A. Galdi, K. M. Shen, J. M. Maxson

Abstract: Alkali antimonide semiconductor photocathodes provide a promising platform for the generation of high brightness electron beams, which are necessary for the development of cutting-edge probes including x-ray free electron lasers and ultrafast electron diffraction. However, to harness the intrinsic brightness limits in these compounds, extrinsic degrading factors, including surface roughness and co… ▽ More Alkali antimonide semiconductor photocathodes provide a promising platform for the generation of high brightness electron beams, which are necessary for the development of cutting-edge probes including x-ray free electron lasers and ultrafast electron diffraction. However, to harness the intrinsic brightness limits in these compounds, extrinsic degrading factors, including surface roughness and contamination, must be overcome. By exploring the growth of CsxSb thin films monitored by in situ electron diffraction, the conditions to reproducibly synthesize atomically smooth films of CsSb on 3C-SiC (100) and graphene coated TiO2 (110) substrates are identified, and detailed structural, morphological, and electronic characterization is presented. These films combine high quantum efficiency in the visible (up to 1.2% at 400 nm), an easily accessible photoemission threshold of 550 nm, low surface roughness (down to 600 pm on a 1 um scale), and a robustness against oxidation up to 15 times greater then Cs3Sb. These properties suggest that CsSb has the potential to operate as an alternative to Cs$_3$Sb in electron source applications where the demands of the vacuum environment might otherwise preclude the use of traditional alkali antimonides. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: 11 pages, 6 figures, 1 table

Journal ref: APL Mater. 11, 101125 (2023)

arXiv:2305.17567 [pdf, other]

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Authors: Mengzi Amy Guo, Donghao Ying, Javad Lavaei, Zuo-Jun Max Shen

Abstract: This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and t… ▽ More This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and their reference price, and consecutive periods in the repeated games are connected by reference price updates. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy for the single-period game, to simultaneously capture the long-run market equilibrium and stability. We propose the online projected gradient ascent algorithm (OPGA), where the firms adjust prices using the first-order derivatives of their log-revenues that can be obtained from the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by OPGA converge to the unique SNE, thereby achieving the no-regret learning and a stable market. Moreover, with appropriate step-sizes, we prove that this convergence exhibits a rate of $\mathcal{O}(1/t)$. △ Less

Submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.10376 [pdf, other]

Universal fragility of spin-glass ground-states under single bond changes

Authors: Mutian Shen, Gerardo Ortiz, Yang-Yu Liu, Martin Weigel, Zohar Nussinov

Abstract: We consider the effect of perturbing a single bond on ground-states of nearest-neighbor Ising spin-glasses, with a Gaussian distribution of the coupling constants, across various two and three-dimensional lattices and regular random graphs. Our results reveal that the ground-states are strikingly susceptible to such changes. Altering the strength of only a single bond beyond a critical threshold v… ▽ More We consider the effect of perturbing a single bond on ground-states of nearest-neighbor Ising spin-glasses, with a Gaussian distribution of the coupling constants, across various two and three-dimensional lattices and regular random graphs. Our results reveal that the ground-states are strikingly susceptible to such changes. Altering the strength of only a single bond beyond a critical threshold value leads to a new ground-state that differs from the original one by a droplet of flipped spins whose boundary and volume diverge with the system size -- an effect that is reminiscent of the more familiar phenomenon of disorder chaos. These elementary fractal-boundary zero-energy droplets and their composites feature robust characteristics and provide the lowest-energy macroscopic spin-glass excitations. Remarkably, within numerical accuracy, the size of such droplets conforms to a nearly universal power-law distribution with exponents dependent on the spatial dimension of the system. Furthermore, the critical coupling strengths adhere to a stretched Gaussian distribution that is predominantly determined by the local coordination number. △ Less

Submitted 14 March, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: (14 pages, 8 figures)

arXiv:2305.09299 [pdf, other]

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

Authors: Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

Abstract: Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality relationship, treat each modality equally, suffer sensor noise, and thus reduce multimodal learning performance. In this work, we propose a novel multimodal contrastive… ▽ More Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks. However, traditional aggregation-based multimodal fusion methods ignore the inter-modality relationship, treat each modality equally, suffer sensor noise, and thus reduce multimodal learning performance. In this work, we propose a novel multimodal contrastive method to explore more reliable multimodal representations under the weak supervision of unimodal predicting. Specifically, we first capture task-related unimodal representations and the unimodal predictions from the introduced unimodal predicting task. Then the unimodal representations are aligned with the more effective one by the designed multimodal contrastive method under the supervision of the unimodal predictions. Experimental results with fused features on two image-text classification benchmarks UPMC-Food-101 and N24News show that our proposed Unimodality-Supervised MultiModal Contrastive UniS-MMC learning method outperforms current state-of-the-art multimodal methods. The detailed ablation study and analysis further demonstrate the advantage of our proposed method. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: ACL 2023 Findings

arXiv:2305.07562 [pdf, ps, other]

Reply to: Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture

Authors: Changjun Fan, Mutian Shen, Zohar Nussinov, Zhong Liu, Yizhou Sun, Yang-Yu Liu

Abstract: We wish to thank Stefan Boettcher for prompting us to further check and highlight the accuracy and scaling of our results. Here we provide a comprehensive response to the Comment written by him. We argue that the Comment did not account for the fairness of the comparison between different methods in searching for the spin-glass ground states. We demonstrate that, with a reasonably larger number of… ▽ More We wish to thank Stefan Boettcher for prompting us to further check and highlight the accuracy and scaling of our results. Here we provide a comprehensive response to the Comment written by him. We argue that the Comment did not account for the fairness of the comparison between different methods in searching for the spin-glass ground states. We demonstrate that, with a reasonably larger number of initial spin configurations, our results agree with the asymptotic scaling form assumed by finite-size corrections. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 5 pages, 1 figure

arXiv:2305.07170 [pdf, other]

Towards Understanding and Improving GFlowNet Training

Authors: Max W. Shen, Emmanuel Bengio, Ehsan Hajiramezanali, Andreas Loukas, Kyunghyun Cho, Tommaso Biancalani

Abstract: Generative flow networks (GFlowNets) are a family of algorithms that learn a generative policy to sample discrete objects $x$ with non-negative reward $R(x)$. Learning objectives guarantee the GFlowNet samples $x$ from the target distribution $p^*(x) \propto R(x)$ when loss is globally minimized over all states or trajectories, but it is unclear how well they perform with practical limits on train… ▽ More Generative flow networks (GFlowNets) are a family of algorithms that learn a generative policy to sample discrete objects $x$ with non-negative reward $R(x)$. Learning objectives guarantee the GFlowNet samples $x$ from the target distribution $p^*(x) \propto R(x)$ when loss is globally minimized over all states or trajectories, but it is unclear how well they perform with practical limits on training resources. We introduce an efficient evaluation strategy to compare the learned sampling distribution to the target reward distribution. As flows can be underdetermined given training data, we clarify the importance of learned flows to generalization and matching $p^*(x)$ in practice. We investigate how to learn better flows, and propose (i) prioritized replay training of high-reward $x$, (ii) relative edge flow policy parametrization, and (iii) a novel guided trajectory balance objective, and show how it can solve a substructure credit assignment problem. We substantially improve sample efficiency on biochemical design tasks. △ Less

Submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted to ICML 2023

arXiv:2305.06584 [pdf, other]

Active Learning in the Predict-then-Optimize Framework: A Margin-Based Approach

Authors: Mo Liu, Paul Grigas, Heyuan Liu, Zuo-Jun Max Shen

Abstract: We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the de… ▽ More We develop the first active learning method in the predict-then-optimize framework. Specifically, we develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream, where the labels correspond to the parameters of an optimization model for decision-making. Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters, which is referred to as the Smart Predict-then-Optimize (SPO) loss. Motivated by the structure of the SPO loss, our algorithm adopts a margin-based criterion utilizing the concept of distance to degeneracy and minimizes a tractable surrogate of the SPO loss on the collected data. In particular, we develop an efficient active learning algorithm with both hard and soft rejection variants, each with theoretical excess risk (i.e., generalization) guarantees. We further derive bounds on the label complexity, which refers to the number of samples whose labels are acquired to achieve a desired small level of SPO risk. Under some natural low-noise conditions, we show that these bounds can be better than the naive supervised learning approach that labels all samples. Furthermore, when using the SPO+ loss function, a specialized surrogate of the SPO loss, we derive a significantly smaller label complexity under separability conditions. We also present numerical evidence showing the practical value of our proposed algorithms in the settings of personalized pricing and the shortest path problem. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.03996 [pdf, ps, other]

Optimized Dimensionality Reduction for Moment-based Distributionally Robust Optimization

Authors: Shiyi Jiang, Jianqiang Cheng, Kai Pan, Zuo-Jun Max Shen

Abstract: Moment-based distributionally robust optimization (DRO) provides an optimization framework to integrate statistical information with traditional optimization approaches. Under this framework, one assumes that the underlying joint distribution of random parameters runs in a distributional ambiguity set constructed by moment information and makes decisions against the worst-case distribution within… ▽ More Moment-based distributionally robust optimization (DRO) provides an optimization framework to integrate statistical information with traditional optimization approaches. Under this framework, one assumes that the underlying joint distribution of random parameters runs in a distributional ambiguity set constructed by moment information and makes decisions against the worst-case distribution within the set. Although most moment-based DRO problems can be reformulated as semidefinite programming (SDP) problems that can be solved in polynomial time, solving high-dimensional SDPs is still time-consuming. Unlike existing approximation approaches that first reduce the dimensionality of random parameters and then solve the approximated SDPs, we propose an optimized dimensionality reduction (ODR) approach. We first show that the ranks of the matrices in the SDP reformulations are small, by which we are then motivated to integrate the dimensionality reduction of random parameters with the subsequent optimization problems. Such integration enables two outer and one inner approximations of the original problem, all of which are low-dimensional SDPs that can be solved efficiently. More importantly, these approximations can theoretically achieve the optimal value of the original high-dimensional SDPs. As these approximations are nonconvex SDPs, we develop modified Alternating Direction Method of Multipliers (ADMM) algorithms to solve them efficiently. We demonstrate the effectiveness of our proposed ODR approach and algorithm in solving two practical problems. Numerical results show significant advantages of our approach on the computational time and solution quality over the three best possible benchmark approaches. Our approach can obtain an optimal or near-optimal (mostly within 0.1%) solution and reduce the computational time by up to three orders of magnitude. △ Less

Submitted 31 October, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

arXiv:2305.00593 [pdf, other]

Reliable Gradient-free and Likelihood-free Prompt Tuning

Authors: Maohao Shen, Soumya Ghosh, Prasanna Sattigeri, Subhro Das, Yuheng Bu, Gregory Wornell

Abstract: Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by develo** techniques for adapting PLMs with only API access. Building on re… ▽ More Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by develo** techniques for adapting PLMs with only API access. Building on recent work on soft prompt tuning, we develop methods to tune the soft prompts without requiring gradient computation. Further, we develop extensions that in addition to not requiring gradients also do not need to access any internal representation of the PLM beyond the input embeddings. Moreover, instead of learning a single prompt, our methods learn a distribution over prompts allowing us to quantify predictive uncertainty. Ours is the first work to consider uncertainty in prompts when only having API access to the PLM. Finally, through extensive experiments, we carefully vet the proposed methods and find them competitive with (and sometimes even improving on) gradient-based approaches with full access to the PLM. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: EACL 2023 (Findings)

arXiv:2304.07407 [pdf, other]

Repeated Principal-Agent Games with Unobserved Agent Rewards and Perfect-Knowledge Agents

Authors: Ilgin Dogan, Zuo-Jun Max Shen, Anil Aswani

Abstract: Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes… ▽ More Motivated by a number of real-world applications from domains like healthcare and sustainable transportation, in this paper we study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework, where: the principal gives a different incentive for each bandit arm, the agent picks a bandit arm to maximize its own expected reward plus incentive, and the principal observes which arm is chosen and receives a reward (different than that of the agent) for the chosen arm. Designing policies for the principal is challenging because the principal cannot directly observe the reward that the agent receives for their chosen actions, and so the principal cannot directly learn the expected reward using existing estimation techniques. As a result, the problem of designing policies for this scenario, as well as similar ones, remains mostly unexplored. In this paper, we construct a policy that achieves a low regret (i.e., square-root regret up to a log factor) in this scenario for the case where the agent has perfect-knowledge about its own expected rewards for each bandit arm. We design our policy by first constructing an estimator for the agent's expected reward for each bandit arm. Since our estimator uses as data the sequence of incentives offered and subsequently chosen arms, the principal's estimation can be regarded as an analogy of online inverse optimization in MAB's. Next we construct a policy that we prove achieves a low regret by deriving finite-sample concentration bounds for our estimator. We conclude with numerical simulations demonstrating the applicability of our policy to real-life setting from collaborative transportation planning. △ Less

Submitted 7 May, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: 50 pages, 4 figures

arXiv:2304.06929 [pdf]

Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

Authors: Rachel Cummings, Damien Desfontaines, David Evans, Roxana Geambasu, Yangsibo Huang, Matthew Jagielski, Peter Kairouz, Gautam Kamath, Sewoong Oh, Olga Ohrimenko, Nicolas Papernot, Ryan Rogers, Milan Shen, Shuang Song, Weijie Su, Andreas Terzis, Abhradeep Thakurta, Sergei Vassilvitskii, Yu-Xiang Wang, Li Xiong, Sergey Yekhanin, Da Yu, Huanyu Zhang, Wanrong Zhang

Abstract: In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 20… ▽ More In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field of differential privacy (DP), with a focus of advancing DP's deployment in real-world applications. Key points and high-level contents of the article were originated from the discussions from "Differential Privacy (DP): Challenges Towards the Next Frontier," a workshop held in July 2022 with experts from industry, academia, and the public sector seeking answers to broad questions pertaining to privacy and its implications in the design of industry-grade systems. This article aims to provide a reference point for the algorithmic and design decisions within the realm of privacy, highlighting important challenges and potential research directions. Covering a wide spectrum of topics, this article delves into the infrastructure needs for designing private systems, methods for achieving better privacy/utility trade-offs, performing privacy attacks and auditing, as well as communicating privacy with broader audiences and stakeholders. △ Less

Submitted 12 March, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.04359 [pdf, other]

Privacy-preserving Inference of Group Mean Difference in Zero-inflated Right Skewed Data with Partitioning and Censoring

Authors: Fang Liu, Ruyu Zhou, Yiming Paul Li, James Honaker, Milan Shen

Abstract: We examine privacy-preserving inferences of group mean differences in zero-inflated right-skewed (zirs) data. Zero inflation and right skewness are typical characteristics of ads clicks and purchases data collected from e-commerce and social media platforms, where we also want to preserve user privacy to ensure that individual data is protected. In this work, we develop likelihood-based and model-… ▽ More We examine privacy-preserving inferences of group mean differences in zero-inflated right-skewed (zirs) data. Zero inflation and right skewness are typical characteristics of ads clicks and purchases data collected from e-commerce and social media platforms, where we also want to preserve user privacy to ensure that individual data is protected. In this work, we develop likelihood-based and model-free approaches to analyzing zirs data with formal privacy guarantees. We first apply partitioning and censoring (PAC) to ``regularize'' zirs data to get the PAC data. We expect inferences based on PAC to have better inferential properties and more robust privacy considerations compared to analyzing the raw data directly. We conduct theoretical analysis to establish the MSE consistency of the privacy-preserving estimators from the proposed approaches based on the PAC data and examine the rate of convergence in the number of partitions and privacy loss parameters. The theoretical results also suggest that it is the sampling error of PAC data rather than the sanitization error that is the limiting factor in the convergence rate. We conduct extensive simulation studies to compare the inferential utility of the proposed approach for different types of zirs data, sample size and partition size combinations, censoring scenarios, mean differences, privacy budgets, and privacy loss composition schemes. We also apply the methods to obtain privacy-preserving inference for the group mean difference in a real digital ads click-through data set. Based on the theoretical and empirical results, we make recommendations regarding the usage of these methods in practice. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2304.02586 [pdf, other]

doi 10.1103/PhysRevB.108.L081105

Electronic nematic order in the normal state of strontium ruthenate

Authors: Ryan Russell, Hari P. Nair, Kyle M. Shen, Darrell G. Schlom, John W. Harter

Abstract: Despite significant achievements in characterizing the properties of Sr$_2$RuO$_4$ over the last three decades, the precise nature of its electronic ground state is still unresolved. In this work, we provide a missing piece of the puzzle by uncovering evidence of electronic nematic order in the normal state of Sr$_2$RuO$_4$, revealed by ultrafast time-resolved optical dichroism measurements of uni… ▽ More Despite significant achievements in characterizing the properties of Sr$_2$RuO$_4$ over the last three decades, the precise nature of its electronic ground state is still unresolved. In this work, we provide a missing piece of the puzzle by uncovering evidence of electronic nematic order in the normal state of Sr$_2$RuO$_4$, revealed by ultrafast time-resolved optical dichroism measurements of uniaxially strained thin films. This nematic order, whose domains are aligned by the strain, spontaneously breaks the four-fold rotational symmetry of the crystal. The temperature dependence of the dichroism resembles an Ising-like order parameter, and optical pum** induces a coherent oscillation of its amplitude mode. A microscopic model of intra-unit-cell nematic order is presented, highlighting the importance of Coulomb repulsion between neighboring oxygen $p$-orbitals. The existence of electronic nematic order in the normal state of Sr$_2$RuO$_4$ may have consequences for the form and mechanism of superconductivity in this material. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Journal ref: Physical Review B 108, L081105 (2023)

arXiv:2304.02149 [pdf]

Picosecond volume expansion drives a later-time insulator-metal transition in a nano-textured Mott Insulator

Authors: Anita Verma, Denis Golež, Oleg Yu. Gorobtsov, Kelson Kaj, Ryan Russell, Jeffrey Z. Kaaret, Erik Lamb, Guru Khalsa, Hari P Nair, Yifei Sun, Ryan Bouck, Nathaniel Schreiber, Jacob P. Ruf, Varun Ramaprasad, Yuya Kubota, Tadashi Togashi, Vladimir A. Stoica, Hari Padmanabhan, John W. Freeland, Nicole A. Benedek, Oleg Shpyrko, John W. Harter, Richard D. Averitt, Darrell G. Schlom, Kyle M. Shen , et al. (2 additional authors not shown)

Abstract: Technology moves towards ever faster switching between different electronic and magnetic states of matter. Manipulating properties at terahertz rates requires accessing the intrinsic timescales of electrons (femtoseconds) and associated phonons (10s of femtoseconds to few picoseconds), which is possible with short-pulse photoexcitation. Yet, in many Mott insulators, the electronic transition is ac… ▽ More Technology moves towards ever faster switching between different electronic and magnetic states of matter. Manipulating properties at terahertz rates requires accessing the intrinsic timescales of electrons (femtoseconds) and associated phonons (10s of femtoseconds to few picoseconds), which is possible with short-pulse photoexcitation. Yet, in many Mott insulators, the electronic transition is accompanied by the nucleation and growth of percolating domains of the changed lattice structure, leading to empirical time scales dominated by slow coarsening dynamics. Here, we use time-resolved X-ray diffraction and reflectivity measurements to investigate the photoinduced insulator-to-metal transition in an epitaxially strained thin film Mott insulator Ca2RuO4. The dynamical transition occurs without observable domain formation and coarsening effects, allowing the study of the intrinsic electronic and lattice dynamics. Above a fluence threshold, the initial electronic excitation drives a fast lattice rearrangement, followed by a slower electronic evolution into a metastable non-equilibrium state. Microscopic calculations based on time-dependent dynamical mean-field theory and semiclassical lattice dynamics within a recently published equilibrium energy landscape picture explain the threshold-behavior and elucidate the delayed onset of the electronic phase transition in terms of kinematic constraints on recombination. Analysis of satellite scattering peaks indicates the persistence of a strain-induced nano-texture in the photoexcited film. This work highlights the importance of combined electronic and structural studies to unravel the physics of dynamic transitions and elucidates the role of strain in tuning the timescales of photoinduced processes. △ Less

Submitted 6 April, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

arXiv:2304.00454 [pdf]

doi 10.1029/2022JA030981

Variability of Antenna Signals From Dust Impacts

Authors: Mitchell M. Shen, Zoltan Sternovsky, David M. Malaspina

Abstract: Electric field instruments carried by spacecraft (SC) are complementary to dedicated dust detectors by registering transient voltage perturbations caused by impact-generated plasma. The signal waveform contains information about the interaction between the impact-generated plasma cloud and the elements of SC-antenna system. The variability of antenna signals from dust impacts has not yet been syst… ▽ More Electric field instruments carried by spacecraft (SC) are complementary to dedicated dust detectors by registering transient voltage perturbations caused by impact-generated plasma. The signal waveform contains information about the interaction between the impact-generated plasma cloud and the elements of SC-antenna system. The variability of antenna signals from dust impacts has not yet been systematically characterized. A set of laboratory measurements are performed to characterize signal variations in response to SC parameters (bias voltage and antenna configuration) and impactor parameters (impact speed and composition). The measurements demonstrate that dipole antenna configurations are sensitive to dust impacts and that the detected signals vary with impact location. When dust impacts occur at low speeds, the antennas typically register smaller amplitudes and less characteristic impact signal shapes. In this case, impact event identification may be more challenging due to lower signal-to-noise ratios and/or more variable waveforms shapes, indicating the compound nature of nonfully developed impact-generated plasmas. To investigate possible variations in the impacting materials, the measurements are carried out using two dust samples with different mass densities: iron and aluminum. No significant variations of the measured waveform or plasma parameters obtained from data analysis are observed between the two materials used. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: Manuscript accepted online by JGR: Space Physics on 22 March 2023

arXiv:2304.00453 [pdf]

doi 10.1029/2020JA028965

Laboratory Study of Antenna Signals Generated by Dust Impacts on Spacecraft

Authors: Mitchell M. Shen, Zoltan Sternovsky, Mihály Horányi, Hsiang-Wen Hsu, David M. Malaspina

Abstract: Space missions often carry antenna instruments that are sensitive to dust impacts, however, the understanding of signal generation mechanisms remained incomplete. A signal generation model in an analytical form is presented that provides a good agreement with laboratory measurements. The model is based on the direct and induced charging of the spacecraft from the collected and esca** fraction of… ▽ More Space missions often carry antenna instruments that are sensitive to dust impacts, however, the understanding of signal generation mechanisms remained incomplete. A signal generation model in an analytical form is presented that provides a good agreement with laboratory measurements. The model is based on the direct and induced charging of the spacecraft from the collected and esca** fraction of free charges from the impact-generated plasma cloud. A set of laboratory experiments is performed using a 20:1 scaled-down model of the Cassini spacecraft in a dust accelerator facility. The results show that impact plasmas can be modeled as a plume of ions streaming away from the impact location and a cloud of isotropically expanding electrons. The fitting of the model to the collected antenna waveforms provides some of the key parameters of the impact plasma. The model also shows that the amplitudes of the impact signals can be significantly reduced in typical space environments due to the discharging effects in the ambient plasma. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: Manuscript accepted online by JGR: Space Physics on 05 April 2021

arXiv:2304.00452 [pdf]

doi 10.1029/2021JA029645

Electrostatic Model for Antenna Signal Generation From Dust Impacts

Authors: Mitchell M. Shen, Zoltan Sternovsky, Alessandro Garzelli, David M. Malaspina

Abstract: Dust impacts on spacecraft are commonly detected by antenna instruments as transient voltage perturbations. The signal waveform is generated by the interaction between the impact-generated plasma cloud and the elements of the antenna-spacecraft system. A general electrostatic model is presented that includes the two key elements of the interaction, namely the charge recollected from the impact pla… ▽ More Dust impacts on spacecraft are commonly detected by antenna instruments as transient voltage perturbations. The signal waveform is generated by the interaction between the impact-generated plasma cloud and the elements of the antenna-spacecraft system. A general electrostatic model is presented that includes the two key elements of the interaction, namely the charge recollected from the impact plasma by the spacecraft and the fraction electrons and cations that escape to infinity. The clouds of esca** electrons and cations generate induced signals, and their vastly different escape speeds are responsible for the characteristic shape of the waveforms. The induced signals are modeled numerically for the geometry of the system and the location of the impact. The model employs a Maxwell capacitance matrix to keep track of the mutual interaction between the elements of the system. A new reduced-size model spacecraft is constructed for laboratory measurements using the dust accelerator facility. The model spacecraft is equipped with four antennas: two operating in a monopole mode, and one pair configured as a dipole. Submicron-sized iron dust particles accelerated to > 20 km/s are used for test measurements, where the waveforms of each antenna are recorded. The electrostatic model provides a remarkably good fit to the data using only a handful of physical fitting parameters, such as the escape speeds of electrons and cations. The presented general model provides the framework for analyzing antenna waveforms and is applicable for a range of space missions investigating the distribution of dust particles in relevant environments. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: Manuscript accepted online by JGR: Space Physics on 13 August 2021

arXiv:2303.17367 [pdf, other]

A BERT-based Unsupervised Grammatical Error Correction Framework

Authors: Nankai Lin, Hongbin Zhang, Menglan Shen, Yu Wang, Shengyi Jiang, Aimin Yang

Abstract: Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring perfor… ▽ More Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework, where GEC is viewed as multi-class classification task. The framework contains three modules: data flow construction module, sentence perplexity scoring module, and error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence's probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the Tagalog corpus we construct and open-source Indonesian corpus and it demonstrates that our framework is complementary to baseline method for low-resource GEC task. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2303.15790 [pdf, other]

doi 10.1007/s11467-023-1333-z

STCF Conceptual Design Report: Volume 1 -- Physics & Detector

Authors: M. Achasov, X. C. Ai, R. Aliberti, L. P. An, Q. An, X. Z. Bai, Y. Bai, O. Bakina, A. Barnyakov, V. Blinov, V. Bobrovnikov, D. Bodrov, A. Bogomyagkov, A. Bondar, I. Boyko, Z. H. Bu, F. M. Cai, H. Cai, J. J. Cao, Q. H. Cao, Z. Cao, Q. Chang, K. T. Chao, D. Y. Chen, H. Chen , et al. (413 additional authors not shown)

Abstract: The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,… ▽ More The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies. △ Less

Submitted 5 October, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Journal ref: Front. Phys. 19(1), 14701 (2024)

arXiv:2303.11660 [pdf, other]

Simple Yet Effective Synthetic Dataset Construction for Unsupervised Opinion Summarization

Authors: Ming Shen, Jie Ma, Shuai Wang, Yogarshi Vyas, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba

Abstract: Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets… ▽ More Opinion summarization provides an important solution for summarizing opinions expressed among a large number of reviews. However, generating aspect-specific and general summaries is challenging due to the lack of annotated data. In this work, we propose two simple yet effective unsupervised approaches to generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Our first approach, Seed Words Based Leave-One-Out (SW-LOO), identifies aspect-related portions of reviews simply by exact-matching aspect seed words and outperforms existing methods by 3.4 ROUGE-L points on SPACE and 0.5 ROUGE-1 point on OPOSUM+ for aspect-specific opinion summarization. Our second approach, Natural Language Inference Based Leave-One-Out (NLI-LOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words and outperforms existing approaches by 1.2 ROUGE-L points on SPACE for aspect-specific opinion summarization and remains competitive on other metrics. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: EACL 2023 Findings

arXiv:2303.08655 [pdf, ps, other]

Lp bounds for Stein's spherical maximal operators

Authors: Naijia Liu, Minxing Shen, Liang Song, Lixin Yan

Abstract: Let ${\frak M}^α$ be the spherical maximal operators of complex order $α$ on ${\mathbb R^n}$. In this article we show that when $n\geq 2$, suppose \begin{eqnarray*} \|{\frak M}^α f \|_{L^p({\mathbb R^n})} \leq C\|f \|_{L^p({\mathbb R^n})} \end{eqnarray*} holds for some $α$ and $p\geq 2$, then we must have ${\rm Re}\,α\geq \max \{1/p-(n-1)/2,\ -(n-1)/p \}.$ When $n=2$, we prove that… ▽ More Let ${\frak M}^α$ be the spherical maximal operators of complex order $α$ on ${\mathbb R^n}$. In this article we show that when $n\geq 2$, suppose \begin{eqnarray*} \|{\frak M}^α f \|_{L^p({\mathbb R^n})} \leq C\|f \|_{L^p({\mathbb R^n})} \end{eqnarray*} holds for some $α$ and $p\geq 2$, then we must have ${\rm Re}\,α\geq \max \{1/p-(n-1)/2,\ -(n-1)/p \}.$ When $n=2$, we prove that $\|{\frak M}^α f \|_{L^p({\mathbb R^2})} \leq C\|f \|_{L^p({\mathbb R^2})}$ if ${\rm Re}\ \ α>\max\{1/p-1/2,\ -1/p\}$, and hence the range of $α$ is sharp in the sense the estimate fails for ${\rm Re}\ α<\max\{1/p-1/2, -1/ p\}.$ △ Less

Submitted 28 April, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 14 pages

arXiv:2303.07531 [pdf, other]

An Algorithm for Subtraction of Doublet Emission Lines in Angle-Resolved Photoemission Spectroscopy

Authors: Yaoju Tarn, Mekhola Sinha, Christopher Pasco, Darrell G. Schlom, Tyrel M. McQueen, Kyle M. Shen, Brendan D. Faeth

Abstract: Plasma discharge lamps are widely utilized in the practice of angle-resolved photoemission spectroscopy (ARPES) experiments as narrow-linewidth ultraviolet photon sources. However, many emission lines such as Ar-I, Ne-I, and Ne-II have closely spaced doublet emission lines, which result in superimposed replica on the measured ARPES spectra. Here, we present a simple method for subtracting the cont… ▽ More Plasma discharge lamps are widely utilized in the practice of angle-resolved photoemission spectroscopy (ARPES) experiments as narrow-linewidth ultraviolet photon sources. However, many emission lines such as Ar-I, Ne-I, and Ne-II have closely spaced doublet emission lines, which result in superimposed replica on the measured ARPES spectra. Here, we present a simple method for subtracting the contribution of these doublet emission lines from photoemission spectra. Benchmarking against ARPES spectra of well-characterized 2D materials, we demonstrate that this algorithm manages to subtract the doublet signal and reproduce the key features of the monochromated He-I$α$ spectra in a physically sound manner that reliably reproduces quantifiable dispersion relations and quasiparticle lifetimes. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 8 pages, 4 main figures, 3 appendix figures

arXiv:2302.14208 [pdf, other]

Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments

Authors: Tung Thai, Ming Shen, Mayank Garg, Ayush Kalani, Nakul Vaidya, Utkarsh Soni, Mudit Verma, Sriram Gopalakrishnan, Neeraj Varshney, Chitta Baral, Subbarao Kambhampati, Jivko Sinapov, Matthias Scheutz

Abstract: Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance. Certain novelties (e.g., changes in environment dynamics) can interfere with the performance or prevent agents from accomplishing task goals altogether. In this paper, we introduce general methods and architectu… ▽ More Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance. Certain novelties (e.g., changes in environment dynamics) can interfere with the performance or prevent agents from accomplishing task goals altogether. In this paper, we introduce general methods and architectural mechanisms for detecting and characterizing different types of novelties, and for building an appropriate adaptive model to accommodate them utilizing logical representations and reasoning methods. We demonstrate the effectiveness of the proposed methods in evaluations performed by a third party in the adversarial multi-agent board game Monopoly. The results show high novelty detection and accommodation rates across a variety of novelty types, including changes to the rules of the game, as well as changes to the agent's action capabilities. △ Less

Submitted 5 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.08077 [pdf, other]

Group Fairness with Uncertainty in Sensitive Attributes

Authors: Abhin Shah, Maohao Shen, Jongha Jon Ryu, Subhro Das, Prasanna Sattigeri, Yuheng Bu, Gregory W. Wornell

Abstract: Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty.… ▽ More Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves the target level of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation. △ Less

Submitted 7 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.04062 [pdf, other]

Machine Learning for Synthetic Data Generation: A Review

Authors: Yingzhou Lu, Minjie Shen, Huazheng Wang, Xiao Wang, Capucine van Rechem, Tianfan Fu, Wenqi Wei

Abstract: Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and difficulties in data access due to concerns surrounding privacy, safety, and regulations. In light of these challenges, the concept of synthetic data generation emerges… ▽ More Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and difficulties in data access due to concerns surrounding privacy, safety, and regulations. In light of these challenges, the concept of synthetic data generation emerges as a promising alternative that allows for data sharing and utilization in ways that real-world data cannot facilitate. This paper presents a comprehensive systematic review of existing studies that employ machine learning models for the purpose of generating synthetic data. The review encompasses various perspectives, starting with the applications of synthetic data generation, spanning computer vision, speech, natural language processing, healthcare, and business domains. Additionally, it explores different machine learning methods, with particular emphasis on neural network architectures and deep generative models. The paper also addresses the crucial aspects of privacy and fairness concerns related to synthetic data generation. Furthermore, this study identifies the challenges and opportunities prevalent in this emerging field, shedding light on the potential avenues for future research. By delving into the intricacies of synthetic data generation, this paper aims to contribute to the advancement of knowledge and inspire further exploration in synthetic data generation. △ Less

Submitted 30 June, 2024; v1 submitted 8 February, 2023; originally announced February 2023.

Showing 51–100 of 381 results for author: shen, M