-
Papez: Resource-Efficient Speech Separation with Auditory Working Memory
Authors:
Hyunseok Oh,
Juheon Yi,
Youngki Lee
Abstract:
Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk…
▽ More
Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk Transformer with small-sized auditory working memory. Second, we adaptively prune the input tokens that do not need further processing. Finally, we reduce the number of parameters through the recurrent transformer. Our extensive evaluation shows that Papez achieves the best resource and accuracy tradeoffs with a large margin. We publicly share our source code at \texttt{https://github.com/snuhcs/Papez}
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Towards unlocking the mystery of adversarial fragility of neural networks
Authors:
**gchao Gao,
Raghu Mudumbai,
Xiaodong Wu,
Jirong Yi,
Catherine Xu,
Hui Xie,
Weiyu Xu
Abstract:
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne…
▽ More
In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural network's adversarial robustness can degrade as the input dimension $d$ increases. Analytically we show that neural networks' adversarial robustness can be only $1/\sqrt{d}$ of the best possible adversarial robustness. Our matrix-theoretic explanation is consistent with an earlier information-theoretic feature-compression-based explanation for the adversarial fragility of neural networks.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation
Authors:
Kuk** Choi,
Jihun Yi,
Jisoo Mok,
Sungroh Yoon
Abstract:
Continuous efforts are being made to advance anomaly detection in various manufacturing processes to increase the productivity and safety of industrial sites. Deep learning replaced rule-based methods and recently emerged as a promising method for anomaly detection in diverse industries. However, in the real world, the scarcity of abnormal data and difficulties in obtaining labeled data create lim…
▽ More
Continuous efforts are being made to advance anomaly detection in various manufacturing processes to increase the productivity and safety of industrial sites. Deep learning replaced rule-based methods and recently emerged as a promising method for anomaly detection in diverse industries. However, in the real world, the scarcity of abnormal data and difficulties in obtaining labeled data create limitations in the training of detection models. In this study, we addressed these shortcomings by proposing a learnable data augmentation-based time-series anomaly detection (LATAD) technique that is trained in a self-supervised manner. LATAD extracts discriminative features from time-series data through contrastive learning. At the same time, learnable data augmentation produces challenging negative samples to enhance learning efficiency. We measured anomaly scores of the proposed technique based on latent feature similarities. As per the results, LATAD exhibited comparable or improved performance to the state-of-the-art anomaly detection assessments on several benchmark datasets and provided a gradient-based diagnosis technique to help identify root causes.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Frequency-mix Knowledge Distillation for Fake Speech Detection
Authors:
Cunhang Fan,
Shunbo Dong,
Jun Xue,
Yujie Chen,
Jiangyan Yi,
Zhao Lv
Abstract:
In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA…
▽ More
In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA method, Frequency-mix (Freqmix), and introduce the Freqmix knowledge distillation (FKD) to enhance model information extraction and generalization abilities. Specifically, we use Freqmix-enhanced data as input for the teacher model, while the student model's input undergoes time-domain DA method. We use a multi-level feature distillation approach to restore information and improve the model's generalization capabilities. Our approach achieves state-of-the-art results on ASVspoof 2021 LA dataset, showing a 31\% improvement over baseline and performs competitively on ASVspoof 2021 DF dataset.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL
Authors:
Jiawen Yi,
Guo Chen,
Zixiang Shen
Abstract:
Text-to-SQL is a technology that converts natural language queries into the structured query language SQL. A novel research approach that has recently gained attention focuses on methods based on the complexity of SQL queries, achieving notable performance improvements. However, existing methods entail significant storage and training costs, which hampers their practical application. To address th…
▽ More
Text-to-SQL is a technology that converts natural language queries into the structured query language SQL. A novel research approach that has recently gained attention focuses on methods based on the complexity of SQL queries, achieving notable performance improvements. However, existing methods entail significant storage and training costs, which hampers their practical application. To address this issue, this paper introduces a method for Text-to-SQL based on Refined Schema and Hardness Prompt. By filtering out low-relevance schema information with a refined schema and identifying query hardness through a Language Model (LM) to form prompts, this method reduces storage and training costs while maintaining performance. It's worth mentioning that this method is applicable to any sequence-to-sequence (seq2seq) LM. Our experiments on the Spider dataset, specifically with large-scale LMs, achieved an exceptional Execution accuracy (EX) of 82.6%, demonstrating the effectiveness and greater suitability of our method for real-world applications.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Self-supervised Graph Neural Network for Mechanical CAD Retrieval
Authors:
Yuhan Quan,
Huan Zhao,
**feng Yi,
Yuqiang Chen
Abstract:
CAD (Computer-Aided Design) plays a crucial role in mechanical industry, where large numbers of similar-shaped CAD parts are often created. Efficiently reusing these parts is key to reducing design and production costs for enterprises. Retrieval systems are vital for achieving CAD reuse, but the complex shapes of CAD models are difficult to accurately describe using text or keywords, making tradit…
▽ More
CAD (Computer-Aided Design) plays a crucial role in mechanical industry, where large numbers of similar-shaped CAD parts are often created. Efficiently reusing these parts is key to reducing design and production costs for enterprises. Retrieval systems are vital for achieving CAD reuse, but the complex shapes of CAD models are difficult to accurately describe using text or keywords, making traditional retrieval methods ineffective. While existing representation learning approaches have been developed for CAD, manually labeling similar samples in these methods is expensive. Additionally, CAD models' unique parameterized data structure presents challenges for applying existing 3D shape representation learning techniques directly. In this work, we propose GC-CAD, a self-supervised contrastive graph neural network-based method for mechanical CAD retrieval that directly models parameterized CAD raw files. GC-CAD consists of two key modules: structure-aware representation learning and contrastive graph learning framework. The method leverages graph neural networks to extract both geometric and topological information from CAD models, generating feature representations. We then introduce a simple yet effective contrastive graph learning framework approach, enabling the model to train without manual labels and generate retrieval-ready representations. Experimental results on four datasets including human evaluation demonstrate that the proposed method achieves significant accuracy improvements and up to 100 times efficiency improvement over the baseline methods.
△ Less
Submitted 17 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection
Authors:
Junfei Yi,
Jianxu Mao,
Tengfei Liu,
Mingjie Li,
Hanyu Gu,
Hui Zhang,
Xiaojun Chang,
Yaonan Wang
Abstract:
Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowle…
▽ More
Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper exploration of latent knowledge. Our method performs effectively during the KD process without requiring intricate structures or extensive computational resources. Extensive experiments validate the effectiveness of our proposed approach across various distillation strategies, detectors, and backbone architectures. Specifically, following our proposed paradigm, the existing FGD method achieves state-of-the-art (SoTA) performance, with ResNet50-based GFL achieving 44.1% mAP on the COCO dataset, surpassing the baselines by 3.9%.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection
Authors:
Yujie Chen,
Jiangyan Yi,
Jun Xue,
Chenglong Wang,
Xiaohui Zhang,
Shunbo Dong,
Siding Zeng,
Jianhua Tao,
Lv Zhao,
Cunhang Fan
Abstract:
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf…
▽ More
Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepfake detection. Specifically, we use sinc Layer and multiple convolutional layers to capture short-range features, and then design a bidirectional Mamba to address Mamba's unidirectional modelling problem and further capture long-range feature information. Moreover, we develop a bidirectional fusion module to integrate embeddings, enhancing audio context representation and combining short- and long-range information. The results show that our proposed RawBMamba achieves a 34.1\% improvement over Rawformer on ASVspoof2021 LA dataset, and demonstrates competitive performance on other datasets.
△ Less
Submitted 18 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking
Authors:
Junzuo Zhou,
Jiangyan Yi,
Tao Wang,
Jianhua Tao,
Ye Bai,
Chu Yuan Zhang,
Yong Ren,
Zhengqi Wen
Abstract:
Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these…
▽ More
Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these problems, we propose TraceableSpeech, a novel TTS model that directly generates watermarked speech, improving watermark imperceptibility and speech quality. Furthermore, We design the frame-wise imprinting and extraction of watermarks, achieving higher robustness against resplicing attacks and temporal flexibility in operation. Experimental results show that TraceableSpeech outperforms the strong baseline where VALL-E or HiFicodec individually uses WavMark in watermark imperceptibility, speech quality and resilience against resplicing attacks. It also can apply to speech of various durations.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
Authors:
Saehyung Lee,
Sangwon Yu,
Junsung Park,
Jihun Yi,
Sungroh Yoon
Abstract:
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling…
▽ More
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling the use of any arbitrary black-box model. Second, we construct the LLM questioner to generate non-redundant questions about the attributes of the target image, based on the information of retrieval candidate images in the current context. This approach mitigates the issues of noisiness and redundancy in the generated questions. Beyond our methodology, we propose a novel evaluation metric, Best log Rank Integral (BRI), for a comprehensive assessment of the interactive retrieval system. PlugIR demonstrates superior performance compared to both zero-shot and fine-tuned baselines in various benchmarks. Additionally, the two methodologies comprising PlugIR can be flexibly applied together or separately in various situations. Our codes are available at https://github.com/Saehyung-Lee/PlugIR.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Lieb-Schultz-Mattis theorems and generalizations in long-range interacting systems
Authors:
Ruizhi Liu,
**min Yi,
Shiyu Zhou,
Liujun Zou
Abstract:
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin inter…
▽ More
In a unified fashion, we establish Lieb-Schultz-Mattis (LSM) theorems and their generalizations in systems with long-range interactions. We show that, for a quantum spin chain, if the interactions decay fast enough as their ranges increase and the Hamiltonian has an anomalous symmetry, the Hamiltonian cannot have a unique gapped symmetric ground state. If the Hamiltonian contains only 2-spin interactions, these theorems hold when the interactions decay faster than $1/r^2$, with $r$ the distance between the two interacting spins. Moreover, any pure state with an anomalous symmetry, which may not be a ground state of any natural Hamiltonian, must be long-range entangled. The symmetries we consider include on-site internal symmetries combined with lattice translation symmetries, and they can also extend to purely internal but non-on-site symmetries. Moreover, these internal symmetries can be discrete or continuous. We explore the applications of the theorems through various examples.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark
Authors:
Xiaohui Zhang,
Jiangyan Yi,
Jianhua Tao
Abstract:
The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts…
▽ More
The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts as an effective tool for detecting newly emerged deepfake audio while maintaining performance on older types, lacks a well-constructed and user-friendly evaluation framework. To address this gap, we introduce EVDA, a benchmark for evaluating continual learning methods in deepfake audio detection. EVDA includes classic datasets from the Anti-Spoofing Voice series, Chinese fake audio detection series, and newly generated deepfake audio from models like GPT-4 and GPT-4o. It supports various continual learning techniques, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and recent methods like Regularized Adaptive Weight Modification (RAWM) and Radian Weight Modification (RWM). Additionally, EVDA facilitates the development of robust algorithms by providing an open interface for integrating new continual learning methods
△ Less
Submitted 15 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
Authors:
Tianyi Zhang,
Jonah Yi,
Zhaozhuo Xu,
Anshumali Shrivastava
Abstract:
Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression,…
▽ More
Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression, but existing methods still fail at very low bit widths. We observe that distinct channels of a key/value activation embedding are highly inter-dependent, and the joint entropy of multiple channels grows at a slower rate than the sum of their marginal entropies. Based on this insight, we propose Coupled Quantization (CQ), which couples multiple key/value channels together to exploit their inter-dependency and encode the activations in a more information-efficient manner. Extensive experiments reveal that CQ outperforms or is competitive with existing baselines in preserving model quality. Furthermore, we demonstrate that CQ can preserve model quality with KV cache quantized down to 1-bit.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Authors:
Zheng Lian,
Haiyang Sun,
Licai Sun,
Zhuofan Wen,
Siyuan Zhang,
Shun Chen,
Hao Gu,
**ming Zhao,
Ziyang Ma,
Xie Chen,
Jiangyan Yi,
Rui Liu,
Kele Xu,
Bin Liu,
Erik Cambria,
Guoying Zhao,
Björn W. Schuller,
Jianhua Tao
Abstract:
Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate annotations), current systems are hard to meet the demands of practical applications. Therefor…
▽ More
Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate annotations), current systems are hard to meet the demands of practical applications. Therefore, we organize a series of challenges around emotion recognition to further promote the development of this area. Last year, we launched MER2023, focusing on three topics: multi-label learning, noise robustness, and semi-supervised learning. This year, we continue to organize MER2024. In addition to expanding the dataset size, we introduce a new track around open-vocabulary emotion recognition. The main consideration for this track is that existing datasets often fix the label space and use majority voting to enhance annotator consistency, but this process may limit the model's ability to describe subtle emotions. In this track, we encourage participants to generate any number of labels in any category, aiming to describe the emotional state as accurately as possible. Our baseline is based on MERTools and the code is available at: https://github.com/zeroQiaoba/MERTools/tree/master/MER2024.
△ Less
Submitted 23 May, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
Light-weight Retinal Layer Segmentation with Global Reasoning
Authors:
Xiang He,
Weiye Song,
Yiming Wang,
Fabio Poiesi,
Ji Yi,
Manishi Desai,
Quanqing Xu,
Kongzheng Yang,
Yi Wan
Abstract:
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications…
▽ More
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications. Therefore, it is desired to design a light-weight network with high performance for retinal layer segmentation. In this paper, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part employs multi-scale feature extraction and a Transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multi-scale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared to the current state-of-the-art method TransUnet with 105.7M parameters on both our collected dataset and two other public datasets, with only 3.3M parameters.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Pegasus-v1 Technical Report
Authors:
Raehyuk Jung,
Hyojun Go,
Jaehyuk Yi,
Jiho Jang,
Daniel Kim,
Jay Suh,
Aiden Lee,
Cooper Han,
Jae Lee,
Jeff Kim,
**-Young Kim,
Junwan Kim,
Kyle Park,
Lucas Lee,
Mars Ha,
Minjoon Seo,
Abraham Jo,
Ed Park,
Hassan Kianinejad,
SJ Kim,
Tony Moon,
Wade Jeong,
Andrei Popescu,
Esther Kim,
EK Yoon
, et al. (19 additional authors not shown)
Abstract:
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi…
▽ More
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Contingency Model Predictive Control for Bipedal Locomotion on Moving Surfaces with a Linear Inverted Pendulum Model
Authors:
Kuo Chen,
Xinyan Huang,
Xunjie Chen,
**gang Yi
Abstract:
Gait control of legged robotic walkers on dynamically moving surfaces (e.g., ships and vehicles) is challenging due to the limited balance control actuation and unknown surface motion. We present a contingent model predictive control (CMPC) for bipedal walker locomotion on moving surfaces with a linear inverted pendulum (LIP) model. The CMPC is a robust design that is built on regular model predic…
▽ More
Gait control of legged robotic walkers on dynamically moving surfaces (e.g., ships and vehicles) is challenging due to the limited balance control actuation and unknown surface motion. We present a contingent model predictive control (CMPC) for bipedal walker locomotion on moving surfaces with a linear inverted pendulum (LIP) model. The CMPC is a robust design that is built on regular model predictive control (MPC) to incorporate the "worst case" predictive motion of the moving surface. Integrated with an LIP model and walking stability constraints, the CMPC framework generates a set of consistent control inputs considering to anticipated uncertainties of the surface motions. Simulation results and comparison with the regular MPC for bipedal walking are conducted and presented. The results confirm the feasibility and superior performance of the proposed CMPC design over the regular MPC under various motion profiles of moving surfaces.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction
Authors:
**yuan Feng,
Min Chen,
Zhiqiang Pu,
Tenghai Qiu,
Jianqiang Yi
Abstract:
Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed f…
▽ More
Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed for simultaneous learning of multiple tasks. TSAC decomposes policy learning into two separate policies: a shared policy (SP) and an action correction policy (ACP). To alleviate conflicts resulting from excessive focus on specific tasks' details in SP, ACP incorporates goal-oriented sparse rewards, enabling an agent to adopt a long-term perspective and achieve generalization across tasks. Additional rewards transform the original problem into a multi-objective MTRL problem. Furthermore, to convert the multi-objective MTRL into a single-objective formulation, TSAC assigns a virtual expected budget to the sparse rewards and employs Lagrangian method to transform a constrained single-objective optimization into an unconstrained one. Experimental evaluations conducted on Meta-World's MT10 and MT50 benchmarks demonstrate that TSAC outperforms existing state-of-the-art methods, achieving significant improvements in both sample efficiency and effective action execution.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation
Authors:
Dafei Qiu,
Shan Xiong,
Jia** Yi,
Jialin Peng
Abstract:
Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its p…
▽ More
Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its performance on complicated segmentation tasks is still far from practical usage. To address these issues, we investigate a highly annotation-efficient weak supervision, which assumes only sparse center-points on a small subset of object instances in the target training images. To achieve accurate segmentation with partial point annotations, we introduce instance counting and center detection as auxiliary tasks and design a multitask learning framework to leverage correlations among the counting, detection, and segmentation, which are all tasks with partial or no supervision. Building upon the different domain-invariances of the three tasks, we enforce counting estimation with a novel soft consistency loss as a global prior for center detection, which further guides the per-pixel segmentation. To further compensate for annotation sparsity, we develop a cross-position cut-and-paste for label augmentation and an entropy-based pseudo-label selection. The experimental results highlight that, by simply using extremely weak annotation, e.g., 15\% sparse points, for model training, the proposed model is capable of significantly outperforming UDA methods and produces comparable performance as the supervised counterpart. The high robustness of our model shown in the validations and the low requirement of expert knowledge for sparse point annotation further improve the potential application value of our model.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems
Authors:
Qingxu Fu,
Zhiqiang Pu,
Min Chen,
Tenghai Qiu,
Jianqiang Yi
Abstract:
Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managi…
▽ More
Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managing an imbalanced number of agents with different types. We propose a Prioritized Heterogeneous League Reinforcement Learning (PHLRL) method to address large-scale heterogeneous cooperation problems. PHLRL maintains a record of various policies that agents have explored during their training and establishes a heterogeneous league consisting of diverse policies to aid in future policy optimization. Furthermore, we design a prioritized policy gradient approach to compensate for the gap caused by differences in the number of different types of agents. Next, we use Unreal Engine to design a large-scale heterogeneous cooperation benchmark named Large-Scale Multiagent Operation (LSMO), which is a complex two-team competition scenario that requires collaboration from both ground and airborne agents. We use experiments to show that PHLRL outperforms state-of-the-art methods, including QTRAN and QPLEX in LSMO.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph
Authors:
Qingxu Fu,
Tenghai Qiu,
Jianqiang Yi,
Zhiqiang Pu,
Xiaolin Ai
Abstract:
Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the int…
▽ More
Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
An AI-Native Runtime for Multi-Wearable Environments
Authors:
Chulhong Min,
Utku Günay Acer,
SiYoung Jang,
Sangwon Choi,
Diana A. Vasile,
Taesik Gong,
Juheon Yi,
Fahim Kawsar
Abstract:
The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI a…
▽ More
The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI accelerators to overcome challenges associated with unpredictable runtime environments. Through its innovative approaches, Mojito demonstrates how future wearable technologies can evolve to be more autonomous.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference
Authors:
Changmin Jeon,
Seonjun Kim,
Juheon Yi,
Youngki Lee
Abstract:
In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressi…
▽ More
In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressive Packed Inference to minimize per-pixel processing costs by selectively determining the necessary pixels to process and combining them to maximize processing parallelism. In particular, our system quickly extracts ROIs and dynamically shrinks them, reflecting the effect of the fast-changing characteristics of objects and scenes. It then intelligently combines such scaled ROIs into large canvases to maximize the utilization of inference accelerators such as GPU. Evaluation across various datasets, models, and devices shows Mondrian outperforms state-of-the-art baselines (e.g., input rescaling, ROI extractions, ROI extractions+batching) by 15.0-19.7% higher accuracy, leading to $\times$6.65 higher throughput than frame-wise inference for processing various 1080p video streams. We will release the code after the paper review.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Channel Estimation Considerate Precoder Design for Multi-user Massive MIMO-OFDM Systems: The Concept and Fast Algorithms
Authors:
Liu Junkai,
Jiang Yi
Abstract:
The sixth-generation (6G) communication networks target peak data rates exceeding 1Tbps, necessitating base stations (BS) to support up to 100 simultaneous data streams. However, sparse pilot allocation to accommodate such streams poses challenges for users' channel estimation. This paper presents Channel Estimation Considerate Precoding (CECP), where BS precoders prioritize facilitating channel e…
▽ More
The sixth-generation (6G) communication networks target peak data rates exceeding 1Tbps, necessitating base stations (BS) to support up to 100 simultaneous data streams. However, sparse pilot allocation to accommodate such streams poses challenges for users' channel estimation. This paper presents Channel Estimation Considerate Precoding (CECP), where BS precoders prioritize facilitating channel estimation alongside maximizing transmission rate. To address the computational complexity of 6G large-scale multi-input multi-output (MIMO) systems, we propose a computationally-efficient space-time block diagonal channel shortening (ST-BDCS) precoding scheme. By leveraging the sparse Toeplitz property of orthogonal frequency division multiplexing (OFDM) channels, this time-domain precoding design effectively mitigates multi-user interference in the downlink and shortens the effective channel's temporal length. Consequently, users can estimate the channels using sparse pilots. To enable fast implementation, we develop a generalized complex-valued Toeplitz matrix QR decomposition algorithm applicable to various space-time signal processing problems. Simulation results demonstrate that the ST-BDCS precoding method approximates the rate performance of conventional subcarrier-by-subcarrier precoding schemes. However, it offers the advantages of easier channel estimation for users and significantly reduced computational complexity for the BS.
△ Less
Submitted 7 April, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis
Authors:
Muxi Chen,
Yi Liu,
Jian Yi,
Changran Xu,
Qiuxia Lai,
Hongliang Wang,
Tsung-Yi Ho,
Qiang Xu
Abstract:
In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative…
▽ More
In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative aesthetic score prediction model that assesses the visual appeal of generated images and unveils the first dataset marked with low-quality regions in generated human images to facilitate automatic defect detection. Our exploration into concept coverage probes the model's effectiveness in interpreting and rendering text-based concepts accurately, while our analysis of fairness reveals biases in model outputs, with an emphasis on gender, race, and age. While our study is grounded in human imagery, this dual-faceted approach is designed with the flexibility to be applicable to other forms of image generation, enhancing our understanding of generative models and paving the way to the next generation of more sophisticated, contextually aware, and ethically attuned generative models. We will release our code, the data used for evaluating generative models and the dataset annotated with defective areas soon.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Investigation of low band gap silicon alloy thin film solar cell for improving short and long wavelength response
Authors:
S. M. Iftiquar,
J. Yi
Abstract:
Numerical simulation of a solar cell can provide various information that can be useful to maximize its power conversion efficiency (PCE). In that respect we carried out a set of numerical simulation using AFORS-HET simulation program. Separately, in order to get a better understanding, the optical absorption in individual layers devices were analyzed. Current-voltage characteristic curve of a ref…
▽ More
Numerical simulation of a solar cell can provide various information that can be useful to maximize its power conversion efficiency (PCE). In that respect we carried out a set of numerical simulation using AFORS-HET simulation program. Separately, in order to get a better understanding, the optical absorption in individual layers devices were analyzed. Current-voltage characteristic curve of a reference cell (Cell-A) was used as the starting device. The PCE of the reference device was $8.85\%$ with short circuit current density $J_{sc}$ of 15.43 mA/cm$^{2}$ and fill factor (FF) of $68.3\%$. However, it was noticed that the reference cell had high parasitic optical absorption at the window layer and the device structure was also not optimized. After suitable optimization the PCE of this device (Cell-B2) improves to $11.59\%$ ($J_{sc}$ and FF of 13.0 mA/cm$^{2}$ and $87\%$ respectively). The results show that the effective optical absorption in the active layer can be improved significantly by optimizing the device structure. The short wavelength response can be improved by reducing the parasitic optical absorption by the doped window layer, while its long wavelength response improves by raising effective absorption length of the active layer. Furthermore, its optimum thickness, for the highest possible PCE, is found to be dependent upon the material properties, more importantly on its defect density.
△ Less
Submitted 8 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Foot Shape-Dependent Resistive Force Model for Bipedal Walkers on Granular Terrains
Authors:
Xunjie Chen,
Aditya Anikode,
**gang Yi,
Tao Liu
Abstract:
Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth corr…
▽ More
Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth correction. The enhanced force model captures fundamental kinetic results considering the robot foot shape, walking gait speed variation, and energy expense. The model is validated by extensive foot intrusion experiments with a bipedal robot. The results confirm the model accuracy on the given type of granular terrains. The model can be further integrated with the motion control of bipedal robotic walkers.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Scene Depth Estimation from Traditional Oriental Landscape Paintings
Authors:
Sungho Kang,
YeongHyeon Park,
Hyunkyu Park,
Juneho Yi
Abstract:
Scene depth estimation from paintings can streamline the process of 3D sculpture creation so that visually impaired people appreciate the paintings with tactile sense. However, measuring depth of oriental landscape painting images is extremely challenging due to its unique method of depicting depth and poor preservation. To address the problem of scene depth estimation from oriental landscape pain…
▽ More
Scene depth estimation from paintings can streamline the process of 3D sculpture creation so that visually impaired people appreciate the paintings with tactile sense. However, measuring depth of oriental landscape painting images is extremely challenging due to its unique method of depicting depth and poor preservation. To address the problem of scene depth estimation from oriental landscape painting images, we propose a novel framework that consists of two-step Image-to-Image translation method with CLIP-based image matching at the front end to predict the real scene image that best matches with the given oriental landscape painting image. Then, we employ a pre-trained SOTA depth estimation model for the generated real scene image. In the first step, CycleGAN converts an oriental landscape painting image into a pseudo-real scene image. We utilize CLIP to semantically match landscape photo images with an oriental landscape painting image for training CycleGAN in an unsupervised manner. Then, the pseudo-real scene image and oriental landscape painting image are fed into DiffuseIT to predict a final real scene image in the second step. Finally, we measure depth of the generated real scene image using a pre-trained depth estimation model such as MiDaS. Experimental results show that our approach performs well enough to predict real scene images corresponding to oriental landscape painting images. To the best of our knowledge, this is the first study to measure the depth of oriental landscape painting images. Our research potentially assists visually impaired people in experiencing paintings in diverse ways. We will release our code and resulting dataset.
△ Less
Submitted 6 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Biomechanical Comparison of Human Walking Locomotion on Solid Ground and Sand
Authors:
Chunchu Zhu,
Xunjie Chen,
**gang Yi
Abstract:
Current studies on human locomotion focus mainly on solid ground walking conditions. In this paper, we present a biomechanic comparison of human walking locomotion on solid ground and sand. A novel dataset containing 3-dimensional motion and biomechanical data from 20 able-bodied adults for locomotion on solid ground and sand is collected. We present the data collection methods and report the sens…
▽ More
Current studies on human locomotion focus mainly on solid ground walking conditions. In this paper, we present a biomechanic comparison of human walking locomotion on solid ground and sand. A novel dataset containing 3-dimensional motion and biomechanical data from 20 able-bodied adults for locomotion on solid ground and sand is collected. We present the data collection methods and report the sensor data along with the kinematic and kinetic profiles of joint biomechanics. A comprehensive analysis of human gait and joint stiffness profiles is presented. The kinematic and kinetic analysis reveals that human walking locomotion on sand shows different ground reaction forces and joint torque profiles, compared with those patterns from walking on solid ground. These gait differences reflect that humans adopt motion control strategies for yielding terrain conditions such as sand. The dataset also provides a source of locomotion data for researchers to study human activity recognition and assistive devices for walking on different terrains.
△ Less
Submitted 28 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
A Reduced-Order Resistive Force Model for Robotic Foot-Mud Interactions
Authors:
Xunjie Chen,
**gang Yi,
Jerry Shan
Abstract:
Legged robots are well-suited for broad exploration tasks in complex environments with yielding terrain. Understanding robotic foot-terrain interactions is critical for safe locomotion and walking efficiency for legged robots. This paper presents a reduced-order resistive-force model for robotic-foot/mud interactions. We focus on vertical robot locomotion on mud and propose a visco-elasto-plastic…
▽ More
Legged robots are well-suited for broad exploration tasks in complex environments with yielding terrain. Understanding robotic foot-terrain interactions is critical for safe locomotion and walking efficiency for legged robots. This paper presents a reduced-order resistive-force model for robotic-foot/mud interactions. We focus on vertical robot locomotion on mud and propose a visco-elasto-plastic analog to model the foot/mud interaction forces. Dynamic behaviors such as mud visco-elasticity, withdrawing cohesive suction, and yielding are explicitly discussed with the proposed model. Besides comparing with dry/wet granular materials, mud intrusion experiments are conducted to validate the force model. The dependency of the model parameter on water content and foot velocity is also studied to reveal in-depth model properties under various conditions. The proposed force model potentially provides an enabling tool for legged robot locomotion and control on muddy terrain.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
Authors:
Tianyi Zhang,
Jonah Wonkyu Yi,
Bowen Yao,
Zhaozhuo Xu,
Anshumali Shrivastava
Abstract:
Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs t…
▽ More
Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups. Through hardware-aware algorithmic designs, NoMAD-Attention achieves the computation of attention scores using repeated fast accesses to SIMD registers despite their highly limited sizes. Moreover, NoMAD-Attention works with pre-trained attention-based LLMs without model finetuning. Empirical evaluations demonstrate that NoMAD-Attention maintains the quality of the original LLMs well, and speeds up the 4-bit quantized LLaMA-7B-based model by up to 2$\times$ at 16k context length. Our results are reproducible at https://github.com/tonyzhang617/nomad-dist.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Computational supremacy in quantum simulation
Authors:
Andrew D. King,
Alberto Nocera,
Marek M. Rams,
Jacek Dziarmaga,
Roeland Wiersema,
William Bernoudy,
Jack Raymond,
Nitin Kaushal,
Niclas Heinsdorf,
Richard Harris,
Kelly Boothby,
Fabio Altomare,
Andrew J. Berkley,
Martin Boschnak,
Kevin Chern,
Holly Christiani,
Samantha Cibere,
Jake Connor,
Martin H. Dehn,
Rahul Deshpande,
Sara Ejtemaee,
Pau Farré,
Kelsey Hamer,
Emile Hoskinson,
Shuiyuan Huang
, et al. (37 additional authors not shown)
Abstract:
Quantum computers hold the promise of solving certain problems that lie beyond the reach of conventional computers. Establishing this capability, especially for impactful and meaningful problems, remains a central challenge. One such problem is the simulation of nonequilibrium dynamics of a magnetic spin system quenched through a quantum phase transition. State-of-the-art classical simulations dem…
▽ More
Quantum computers hold the promise of solving certain problems that lie beyond the reach of conventional computers. Establishing this capability, especially for impactful and meaningful problems, remains a central challenge. One such problem is the simulation of nonequilibrium dynamics of a magnetic spin system quenched through a quantum phase transition. State-of-the-art classical simulations demand resources that grow exponentially with system size. Here we show that superconducting quantum annealing processors can rapidly generate samples in close agreement with solutions of the Schrödinger equation. We demonstrate area-law scaling of entanglement in the model quench in two-, three- and infinite-dimensional spin glasses, supporting the observed stretched-exponential scaling of effort for classical approaches. We assess approximate methods based on tensor networks and neural networks and conclude that no known approach can achieve the same accuracy as the quantum annealer within a reasonable timeframe. Thus quantum annealers can answer questions of practical importance that classical computers cannot.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Complete and Near-Optimal Robotic Crack Coverage and Filling in Civil Infrastructure
Authors:
Vishnu Veeraraghavan,
Kyle Hunte,
**gang Yi,
Kaiyan Yu
Abstract:
We present a simultaneous sensor-based inspection and footprint coverage (SIFC) planning and control design with applications to autonomous robotic crack map** and filling. The main challenge of the SIFC problem lies in the coupling of complete sensing (for map**) and robotic footprint (for filling) coverage tasks. Initially, we assume known target information (e.g., crack) and employ classic…
▽ More
We present a simultaneous sensor-based inspection and footprint coverage (SIFC) planning and control design with applications to autonomous robotic crack map** and filling. The main challenge of the SIFC problem lies in the coupling of complete sensing (for map**) and robotic footprint (for filling) coverage tasks. Initially, we assume known target information (e.g., crack) and employ classic cell decomposition methods to achieve complete sensing coverage of the workspace and complete robotic footprint coverage using the least-cost route. Subsequently, we generalize the algorithm to handle unknown target information, allowing the robot to scan and incrementally construct the target graph online while conducting robotic footprint coverage. The online polynomial-time SIFC planning algorithm minimizes the total robot traveling distance, guarantees complete sensing coverage of the entire workspace, and achieves near-optimal robotic footprint coverage, as demonstrated through empirical experiments. For the demonstrated application, we design coordinated nozzle motion control with the planned robot trajectory to efficiently fill all cracks within the robot's footprint. Experimental results are presented to illustrate the algorithm's design, performance, and comparisons. The SIFC algorithm offers a high-efficiency motion planning solution for various robotic applications requiring simultaneous sensing and actuation coverage.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Structural and resistivity properties of Fe$_{1-x}$Co${_x}$Se single crystals grown by the molten salt method
Authors:
Qiaoyu Wang,
Mingwei Ma,
Binbin Ruan,
Menghu Zhou,
Yadong Gu,
Qingsong Yang,
Lewei Chen,
Yunqing Shi,
Junkun Yi,
Genfu Chen,
Zhian Ren
Abstract:
A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively cou…
▽ More
A series of tetragonal Fe$_{1-x}$Co${_x}$Se single crystals with a complete Co do** range (0$\leq$x$\leq$0.52) up to its solid solubility limit in FeSe have been grown by an eutectic AlCl${_3}$/KCl molten salt method. The typical lateral size of as-grown Fe$_{1-x}$Co${_x}$Se single crystals is 1$-$5 mm. The chemical composition and homogeneity of the crystals was examined by both inductively coupled plasma atomic emission spectroscopy and energy dispersive spectrometer. X-ray diffraction analysis demonstrates that the crystal lattice parameters $a$ and $c$ are both linearly decreased with increasing Co do** level x. In the whole do** range, all the samples show metallic behaviour in contrast to a metal insulator transition of Cu-doped FeSe according to the resistivity measurements
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Absence of localization in Weyl semimetals
Authors:
**min Yi,
A. A. Burkov
Abstract:
One of the fundamental facts of condensed matter physics is that sufficient amount of disorder always turns a Fermi liquid metal into an Anderson insulator: a compressible, but non-conducting phase of matter. Recently, topological semimetals have emerged as another way a metallic phase may be realized. In this paper we point out that, unlike ordinary metals, at least some topological semimetals ar…
▽ More
One of the fundamental facts of condensed matter physics is that sufficient amount of disorder always turns a Fermi liquid metal into an Anderson insulator: a compressible, but non-conducting phase of matter. Recently, topological semimetals have emerged as another way a metallic phase may be realized. In this paper we point out that, unlike ordinary metals, at least some topological semimetals are immune to localization, provided certain conditions are satisfied. We present several physical arguments, based on diagrammatic perturbation theory and Keldysh field theory, as well as domain wall network construction, to back up this claim.
△ Less
Submitted 16 May, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
GOOD: Towards Domain Generalized Orientated Object Detection
Authors:
Qi Bi,
Beichen Zhou,
**gjun Yi,
Wei Ji,
Haolan Zhan,
Gui-Song Xia
Abstract:
Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target dom…
▽ More
Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains. Learning domain generalized oriented object detectors is particularly challenging, as the cross-domain style variation not only negatively impacts the content representation, but also leads to unreliable orientation predictions. To address these challenges, we propose a generalized oriented object detector (GOOD). After style hallucination by the emerging contrastive language-image pre-training (CLIP), it consists of two key components, namely, rotation-aware content consistency learning (RAC) and style consistency learning (SEC). The proposed RAC allows the oriented object detector to learn stable orientation representation from style-diversified samples. The proposed SEC further stabilizes the generalization ability of content representation from different image styles. Extensive experiments on multiple cross-domain settings show the state-of-the-art performance of GOOD. Source code will be publicly available.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network
Authors:
Siyi Chen,
Amir H. Kashani,
Ji Yi
Abstract:
The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hie…
▽ More
The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hierarchy and allow detailed morphological quantification, and yet remains a challenging task. Here, we presented a novel approach for a robust semi-automatic vessel tracing algorithm on human fundus images by an instance segmentation neural network (InSegNN). Distinct from semantic segmentation, InSegNN separates and labels different vascular trees individually and therefore enable tracing each tree throughout its branching. We have built-in three strategies to improve robustness and accuracy with temporal learning, spatial multi-sampling, and dynamic probability map. We achieved 83% specificity, and 50% improvement in Symmetric Best Dice (SBD) compared to literature, and outperformed baseline U-net. We have demonstrated tracing individual vessel trees from fundus images, and simultaneously retain the vessel hierarchy information. InSegNN paves a way for any subsequent morphological analysis of vascular morphology in relation to retinal diseases.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Correlation function and the inverse problem in the $BD$ interaction
Authors:
Hai-Peng Li,
**g-Yu Yi,
Chu-Wen Xiao,
De-Liang Yao,
Wei-Hong Liang,
Eulogio Oset
Abstract:
We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The import…
▽ More
We study the correlation functions of the $B^0 D^+, B^+ D^0$ system, which develops a bound state of approximately $40$ MeV, using inputs consistent with the $T_{cc}(3875)$ state. Then we address the inverse problem starting from these correlation functions to determine the scattering observables related to the system, including the existence of the bound state and its molecular nature. The important output of the approach is the uncertainty with which these observables can be obtained, considering errors in the $B^0 D^+, B^+ D^0$ correlation functions typical of current values in present correlation functions. We find that it is possible to obtain scattering lengths and effective ranges with relative high precision and the existence of a bound state. Although the pole position is obtained with errors of the order of $50 \%$ of the binding energy, the molecular probability of the state is obtained with a very small error of the order of $6\%$. All these findings can serve as motivation to perform such measurements in future runs of high energy hadron collisions.
△ Less
Submitted 28 March, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Enabling Cross-Camera Collaboration for Video Analytics on Distributed Smart Cameras
Authors:
Chulhong Min,
Juheon Yi,
Utku Gunay Acer,
Fahim Kawsar
Abstract:
Overlap** cameras offer exciting opportunities to view a scene from different angles, allowing for more advanced, comprehensive and robust analysis. However, existing visual analytics systems for multi-camera streams are mostly limited to (i) per-camera processing and aggregation and (ii) workload-agnostic centralized processing architectures. In this paper, we present Argus, a distributed video…
▽ More
Overlap** cameras offer exciting opportunities to view a scene from different angles, allowing for more advanced, comprehensive and robust analysis. However, existing visual analytics systems for multi-camera streams are mostly limited to (i) per-camera processing and aggregation and (ii) workload-agnostic centralized processing architectures. In this paper, we present Argus, a distributed video analytics system with cross-camera collaboration on smart cameras. We identify multi-camera, multi-target tracking as the primary task of multi-camera video analytics and develop a novel technique that avoids redundant, processing-heavy identification tasks by leveraging object-wise spatio-temporal association in the overlap** fields of view across multiple cameras. We further develop a set of techniques to perform these operations across distributed cameras without cloud support at low latency by (i) dynamically ordering the camera and object inspection sequence and (ii) flexibly distributing the workload across smart cameras, taking into account network transmission and heterogeneous computational capacities. Evaluation of three real-world overlap** camera datasets with two Nvidia Jetson devices shows that Argus reduces the number of object identifications and end-to-end latency by up to 7.13x and 2.19x (4.86x and 1.60x compared to the state-of-the-art), while achieving comparable tracking quality.
△ Less
Submitted 26 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Measuring Policy Distance for Multi-Agent Reinforcement Learning
Authors:
Tianyi Hu,
Zhiqiang Pu,
Xiaolin Ai,
Tenghai Qiu,
Jianqiang Yi
Abstract:
Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the…
▽ More
Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the diversity evolution in multi-agent systems, but also provide guidance for the design of diversity-based MARL algorithms. In this paper, we propose the multi-agent policy distance (MAPD), a general tool for measuring policy differences in MARL. By learning the conditional representations of agents' decisions, MAPD can computes the policy distance between any pair of agents. Furthermore, we extend MAPD to a customizable version, which can quantify differences among agent policies on specified aspects. Based on the online deployment of MAPD, we design a multi-agent dynamic parameter sharing (MADPS) algorithm as an example of the MAPD's applications. Extensive experiments demonstrate that our method is effective in measuring differences in agent policies and specific behavioral tendencies. Moreover, in comparison to other methods of parameter sharing, MADPS exhibits superior performance.
△ Less
Submitted 28 January, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization
Authors:
Qi Bi,
Wei Ji,
**gjun Yi,
Haolan Zhan,
Gui-Song Xia
Abstract:
High-quality annotation of fine-grained visual categories demands great expert knowledge, which is taxing and time consuming. Alternatively, learning fine-grained visual representation from enormous unlabeled images (e.g., species, brands) by self-supervised learning becomes a feasible solution. However, recent researches find that existing self-supervised learning methods are less qualified to re…
▽ More
High-quality annotation of fine-grained visual categories demands great expert knowledge, which is taxing and time consuming. Alternatively, learning fine-grained visual representation from enormous unlabeled images (e.g., species, brands) by self-supervised learning becomes a feasible solution. However, recent researches find that existing self-supervised learning methods are less qualified to represent fine-grained categories. The bottleneck lies in that the pre-text representation is built from every patch-wise embedding, while fine-grained categories are only determined by several key patches of an image. In this paper, we propose a Cross-level Multi-instance Distillation (CMD) framework to tackle the challenge. Our key idea is to consider the importance of each image patch in determining the fine-grained pre-text representation by multiple instance learning. To comprehensively learn the relation between informative patches and fine-grained semantics, the multi-instance knowledge distillation is implemented on both the region/image crop pairs from the teacher and student net, and the region-image crops inside the teacher / student net, which we term as intra-level multi-instance distillation and inter-level multi-instance distillation. Extensive experiments on CUB-200-2011, Stanford Cars and FGVC Aircraft show that the proposed method outperforms the contemporary method by upto 10.14% and existing state-of-the-art self-supervised learning approaches by upto 19.78% on both top-1 accuracy and Rank-1 retrieval metric.
△ Less
Submitted 26 February, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper
Authors:
Jayeon Yi,
Junghyun Koo,
Kyogu Lee
Abstract:
Clip** is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing the disrupted signal. Therefore, a real-time-capable, robust, and low-response-time method for speech declip** (SD) is desired. In this work, we introduce DDD…
▽ More
Clip** is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing the disrupted signal. Therefore, a real-time-capable, robust, and low-response-time method for speech declip** (SD) is desired. In this work, we introduce DDD (Demucs-Discriminator-Declipper), a real-time-capable speech-declip** deep neural network (DNN) that requires less response time by design. We first observe that a previously untested real-time-capable DNN model, Demucs, exhibits a reasonable declip** performance. Then we utilize adversarial learning objectives to increase the perceptual quality of output speech without additional inference overhead. Subjective evaluations on harshly clipped speech shows that DDD outperforms the baselines by a wide margin in terms of speech quality. We perform detailed waveform and spectral analyses to gain an insight into the output behavior of DDD in comparison to the baselines. Finally, our streaming simulations also show that DDD is capable of sub-decisecond mean response times, outperforming the state-of-the-art DNN approach by a factor of six.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Data-Driven Subsampling in the Presence of an Adversarial Actor
Authors:
Abu Shafin Mohammad Mahdee Jameel,
Ahmed P. Mohamed,
**ho Yi,
Aly El Gamal,
Akshay Malhotra
Abstract:
Deep learning based automatic modulation classification (AMC) has received significant attention owing to its potential applications in both military and civilian use cases. Recently, data-driven subsampling techniques have been utilized to overcome the challenges associated with computational complexity and training time for AMC. Beyond these direct advantages of data-driven subsampling, these me…
▽ More
Deep learning based automatic modulation classification (AMC) has received significant attention owing to its potential applications in both military and civilian use cases. Recently, data-driven subsampling techniques have been utilized to overcome the challenges associated with computational complexity and training time for AMC. Beyond these direct advantages of data-driven subsampling, these methods also have regularizing properties that may improve the adversarial robustness of the modulation classifier. In this paper, we investigate the effects of an adversarial attack on an AMC system that employs deep learning models both for AMC and for subsampling. Our analysis shows that subsampling itself is an effective deterrent to adversarial attacks. We also uncover the most efficient subsampling strategy when an adversarial attack on both the classifier and the subsampler is anticipated.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Authors:
**gwei Yi,
Yueqi Xie,
Bin Zhu,
Emre Kiciman,
Guangzhong Sun,
Xing Xie,
Fangzhao Wu
Abstract:
The integration of large language models (LLMs) with external content has enabled more up-to-date and wide-ranging applications of LLMs, such as Microsoft Copilot. However, this integration has also exposed LLMs to the risk of indirect prompt injection attacks, where an attacker can embed malicious instructions within external content, compromising LLM output and causing responses to deviate from…
▽ More
The integration of large language models (LLMs) with external content has enabled more up-to-date and wide-ranging applications of LLMs, such as Microsoft Copilot. However, this integration has also exposed LLMs to the risk of indirect prompt injection attacks, where an attacker can embed malicious instructions within external content, compromising LLM output and causing responses to deviate from user expectations. To investigate this important but underexplored issue, we introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to evaluate the risk of such attacks. Based on the evaluation, our work makes a key analysis of the underlying reason for the success of the attack, namely the inability of LLMs to distinguish between instructions and external content and the absence of LLMs' awareness to not execute instructions within external content. Building upon this analysis, we develop two black-box methods based on prompt learning and a white-box defense method based on fine-tuning with adversarial training accordingly. Experimental results demonstrate that black-box defenses are highly effective in mitigating these attacks, while the white-box defense reduces the attack success rate to near-zero levels. Overall, our work systematically investigates indirect prompt injection attacks by introducing a benchmark, analyzing the underlying reason for the success of the attack, and develo** an initial set of defenses.
△ Less
Submitted 8 March, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure
Authors:
Feng Han,
**gang Yi
Abstract:
External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC mod…
▽ More
External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC modeling structure. Two GP-based learning controllers are presented by using the EIC structure property. The partial EIC (PEIC)-based control design partitions the robotic dynamics into a fully actuated subsystem and one reduced-order underactuated system. The null-space EIC (NEIC)-based control compensates for the uncontrolled motion in a subspace, while the other closed-loop dynamics are not affected. Under the PEIC- and NEIC-based, the tracking and balance tasks are guaranteed and convergence rate and bounded errors are achieved without causing any uncontrolled motion by the original EIC-based control. We validate the results and demonstrate the GP-based learning control design performance using two inverted pendulum platforms.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Non-gravitational signals of dark energy under a gauge symmetry
Authors:
Kunio Kaneta,
Hye-Sung Lee,
Jiheon Lee,
Jaeok Yi
Abstract:
We investigate non-gravitational signals of dark energy within the framework of gauge symmetry in the dark energy sector. Traditionally, dark energy has been primarily studied through gravitational effects within general relativity or its extensions. On the other hand, the gauge principles have played a central role in the standard model sector and dark matter sector. If the dark energy field oper…
▽ More
We investigate non-gravitational signals of dark energy within the framework of gauge symmetry in the dark energy sector. Traditionally, dark energy has been primarily studied through gravitational effects within general relativity or its extensions. On the other hand, the gauge principles have played a central role in the standard model sector and dark matter sector. If the dark energy field operates under a gauge symmetry, it introduces the possibility of studying all major components of the present universe under the same gauge principle. This approach marks a significant shift from conventional methodologies, offering a new avenue to explore dark energy.
△ Less
Submitted 13 February, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection
Authors:
Xiaohui Zhang,
Jiangyan Yi,
Chenglong Wang,
Chuyuan Zhang,
Siding Zeng,
Jianhua Tao
Abstract:
The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the…
▽ More
The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the emergent effective approaches is continual learning. In this paper, we propose a continual learning approach called Radian Weight Modification (RWM) for audio deepfake detection. The fundamental concept underlying RWM involves categorizing all classes into two groups: those with compact feature distributions across tasks, such as genuine audio, and those with more spread-out distributions, like various types of fake audio. These distinctions are quantified by means of the in-class cosine distance, which subsequently serves as the basis for RWM to introduce a trainable gradient modification direction for distinct data types. Experimental evaluations against mainstream continual learning methods reveal the superiority of RWM in terms of knowledge acquisition and mitigating forgetting in audio deepfake detection. Furthermore, RWM's applicability extends beyond audio deepfake detection, demonstrating its potential significance in diverse machine learning domains such as image recognition.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Vacancy-induced tunable Kondo effect in twisted bilayer graphene
Authors:
Yueqing Chang,
****g Yi,
Ang-Kun Wu,
Fabian B. Kugler,
Eva Andrei,
David Vanderbilt,
Gabriel Kotliar,
J. H. Pixley
Abstract:
In single sheets of graphene, vacancy-induced states have been shown to host an effective spin-1/2 hole that can be Kondo-screened at low temperatures. Here, we show how these vacancy-induced impurity states survive in twisted bilayer graphene (TBG), which thus provides a tunable system to probe the critical destruction of the Kondo effect in pseudogap hosts. Ab-initio calculations and atomic-scal…
▽ More
In single sheets of graphene, vacancy-induced states have been shown to host an effective spin-1/2 hole that can be Kondo-screened at low temperatures. Here, we show how these vacancy-induced impurity states survive in twisted bilayer graphene (TBG), which thus provides a tunable system to probe the critical destruction of the Kondo effect in pseudogap hosts. Ab-initio calculations and atomic-scale modeling are used to determine the nature of the vacancy states in the vicinity of the magic angle in TBG, demonstrating that the vacancy can be treated as a quantum impurity. Utilizing this insight, we construct an Anderson impurity model with a TBG host that we solve using the numerical renormalization group combined with the kernel polynomial method. We determine the phase diagram of the model and show how there is a strict dichotomy between vacancies in the AA / BB versus AB / BA tunneling regions. In AB / BA vacancies, we find that the Kondo temperature at the magic angle develops a broad distribution with a tail to vanishing temperatures due to multifractal wavefunctions at the magic angle. We argue that the scanning tunneling microscopy response in the vicinity of the vacancy can act as a non-trivial probe of both the critical single-particle states and the underlying many-body ground state in magic-angle TBG.
△ Less
Submitted 14 December, 2023;
originally announced December 2023.
-
Control Risk for Potential Misuse of Artificial Intelligence in Science
Authors:
Jiyan He,
Weitao Feng,
Yaosen Min,
**gwei Yi,
Kunsheng Tang,
Shuai Li,
Jie Zhang,
Kejiang Chen,
Wenbo Zhou,
Xing Xie,
Weiming Zhang,
Nenghai Yu,
Shuxin Zheng
Abstract:
The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in scien…
▽ More
The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in science, and call for responsible AI development and use in this domain. We first itemize the risks posed by AI in scientific contexts, then demonstrate the risks by highlighting real-world examples of misuse in chemical science. These instances underscore the need for effective risk management strategies. In response, we propose a system called SciGuard to control misuse risks for AI models in science. We also propose a red-teaming benchmark SciMT-Safety to assess the safety of different systems. Our proposed SciGuard shows the least harmful impact in the assessment without compromising performance in benign tests. Finally, we highlight the need for a multidisciplinary and collaborative effort to ensure the safe and ethical use of AI models in science. We hope that our study can spark productive discussions on using AI ethically in science among researchers, practitioners, policymakers, and the public, to maximize benefits and minimize the risks of misuse.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Decoupling SQL Query Hardness Parsing for Text-to-SQL
Authors:
Jiawen Yi,
Guo Chen
Abstract:
The fundamental goal of the Text-to-SQL task is to translate natural language question into SQL query. Current research primarily emphasizes the information coupling between natural language questions and schemas, and significant progress has been made in this area. The natural language questions as the primary task requirements source determines the hardness of correspond SQL queries, the correla…
▽ More
The fundamental goal of the Text-to-SQL task is to translate natural language question into SQL query. Current research primarily emphasizes the information coupling between natural language questions and schemas, and significant progress has been made in this area. The natural language questions as the primary task requirements source determines the hardness of correspond SQL queries, the correlation between the two always be ignored. However, when the correlation between questions and queries was decoupled, it may simplify the task. In this paper, we introduce an innovative framework for Text-to-SQL based on decoupling SQL query hardness parsing. This framework decouples the Text-to-SQL task based on query hardness by analyzing questions and schemas, simplifying the multi-hardness task into a single-hardness challenge. This greatly reduces the parsing pressure on the language model. We evaluate our proposed framework and achieve a new state-of-the-art performance of fine-turning methods on Spider dev.
△ Less
Submitted 29 December, 2023; v1 submitted 11 December, 2023;
originally announced December 2023.