-
BlockEmulator: An Emulator Enabling to Test Blockchain Sharding Protocols
Authors:
Huawei Huang,
Guang Ye,
Qinde Chen,
Zhaokang Yin,
Xiaofei Luo,
Jianru Lin,
Taotao Li,
Qinglin Yang,
Zibin Zheng
Abstract:
Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating bloc…
▽ More
Numerous blockchain simulators have been proposed to allow researchers to simulate mainstream blockchains. However, we have not yet found a testbed that enables researchers to develop and evaluate their new consensus algorithms or new protocols for blockchain sharding systems. To fill this gap, we develop BlockEmulator, which is designed as an experimental platform, particularly for emulating blockchain sharding mechanisms. BlockEmulator adopts a lightweight blockchain architecture such that developers can only focus on implementing their new protocols or mechanisms. Using layered modules and useful programming interfaces offered by BlockEmulator, researchers can implement a new protocol with minimum effort. Through experiments, we test various functionalities of BlockEmulator in two steps. Firstly, we prove the correctness of the emulation results yielded by BlockEmulator by comparing the theoretical analysis with the observed experiment results. Secondly, other experimental results demonstrate that BlockEmulator can facilitate the measurement of a series of metrics, including throughput, transaction confirmation latency, cross-shard transaction ratio, the queuing size of transaction pools, workload distribution across blockchain shards, etc. We have made BlockEmulator open-source in Github.
△ Less
Submitted 11 November, 2023; v1 submitted 6 November, 2023;
originally announced November 2023.
-
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models
Authors:
Zhelun Shi,
Zhipin Wang,
Hongxing Fan,
Zhenfei Yin,
Lu Sheng,
Yu Qiao,
**g Shao
Abstract:
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks. However, even though a list of benchmarks has been proposed, the capabilities and limitations of MLLMs are still not comprehensively understood, due to a lack of a standardized and holistic evaluation framework. To this end, we present the first Compre…
▽ More
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting with visual content with myriad potential downstream tasks. However, even though a list of benchmarks has been proposed, the capabilities and limitations of MLLMs are still not comprehensively understood, due to a lack of a standardized and holistic evaluation framework. To this end, we present the first Comprehensive Evaluation Framework (ChEF) that can holistically profile each MLLM and fairly compare different MLLMs. First, we structure ChEF as four modular components, i.e., Scenario as scalable multimodal datasets, Instruction as flexible instruction retrieving formulae, Inferencer as reliable question answering strategies, and Metric as indicative task-specific score functions. Based on them, ChEF facilitates versatile evaluations in a standardized framework, and new evaluations can be built by designing new Recipes (systematic selection of these four components). Notably, current MLLM benchmarks can be readily summarized as recipes of ChEF. Second, we introduce 6 new recipes to quantify competent MLLMs' desired capabilities (or called desiderata, i.e., calibration, in-context learning, instruction following, language performance, hallucination, and robustness) as reliable agents that can perform real-world multimodal interactions. Third, we conduct a large-scale evaluation of 9 prominent MLLMs on 9 scenarios and 6 desiderata. Our evaluation summarized over 20 valuable observations concerning the generalizability of MLLMs across various scenarios and the composite capability of MLLMs required for multimodal interactions. We will publicly release all the detailed implementations for further analysis, as well as an easy-to-use modular toolkit for the integration of new recipes and models, so that ChEF can be a growing evaluation framework for the MLLM community.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE
Authors:
Zeren Chen,
Ziqin Wang,
Zhen Wang,
Huayang Liu,
Zhenfei Yin,
Si Liu,
Lu Sheng,
Wanli Ouyang,
Yu Qiao,
**g Shao
Abstract:
Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-shot generalization capabilities to multimodal learning through instruction tuning. As more modalities and downstream tasks are introduced, negative conflicts and interference may have a worse impact on performance. While this phenomenon has been overlooked in previous work, we propose a novel and extensible framew…
▽ More
Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-shot generalization capabilities to multimodal learning through instruction tuning. As more modalities and downstream tasks are introduced, negative conflicts and interference may have a worse impact on performance. While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs). Specifically, we combine the well-known Mixture-of-Experts (MoE) and one of the representative PEFT techniques, i.e., LoRA, designing a novel LLM-based decoder, called LoRA-MoE, for multimodal learning. To the best of our knowledge, we are one of the pioneering efforts to introduce MoE into MLLMs to address this problem. The experimental results (about 20% improvement) have shown the effectiveness and versatility of our design in various 2D and 3D downstream tasks. Code and datasets are available at https://openlamm.github.io/paper_list/Octavius.
△ Less
Submitted 13 March, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
3D seismic survey design by maximizing the spectral gap
Authors:
Yijun Zhang,
Ziyi Yin,
Oscar López,
Ali Siahkoohi,
Mathias Louboutin,
Felix J. Herrmann
Abstract:
The massive cost of 3D acquisition calls for methods to reduce the number of receivers by designing optimal receiver sampling masks. Recent studies on 2D seismic showed that maximizing the spectral gap of the subsampling mask leads to better wavefield reconstruction results. We enrich the current study by proposing a simulation-free method to generate optimal 3D acquisition by maximizing the spect…
▽ More
The massive cost of 3D acquisition calls for methods to reduce the number of receivers by designing optimal receiver sampling masks. Recent studies on 2D seismic showed that maximizing the spectral gap of the subsampling mask leads to better wavefield reconstruction results. We enrich the current study by proposing a simulation-free method to generate optimal 3D acquisition by maximizing the spectral gap of the subsampling mask via a simulated annealing algorithm. Numerical experiments confirm improvement of the proposed method over receiver sampling locations obtained by jittered sampling.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL
Authors:
Yuhang Zhou,
Yu He,
Siyu Tian,
Yuchen Ni,
Zhangyue Yin,
Xiang Liu,
Chuanjun Ji,
Sen Liu,
Xipeng Qiu,
Guangnan Ye,
Hongfeng Chai
Abstract:
While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we i…
▽ More
While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we introduce a novel approach, $R^3$-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.
△ Less
Submitted 1 July, 2024; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Inference of CO2 flow patterns -- a feasibility study
Authors:
Abhinav Prakash Gahlot,
Huseyin Tuna Erdinc,
Rafael Orozco,
Ziyi Yin,
Felix J. Herrmann
Abstract:
As the global deployment of carbon capture and sequestration (CCS) technology intensifies in the fight against climate change, it becomes increasingly imperative to establish robust monitoring and detection mechanisms for potential underground CO2 leakage, particularly through pre-existing or induced faults in the storage reservoir's seals. While techniques such as history matching and time-lapse…
▽ More
As the global deployment of carbon capture and sequestration (CCS) technology intensifies in the fight against climate change, it becomes increasingly imperative to establish robust monitoring and detection mechanisms for potential underground CO2 leakage, particularly through pre-existing or induced faults in the storage reservoir's seals. While techniques such as history matching and time-lapse seismic monitoring of CO2 storage have been used successfully in tracking the evolution of CO2 plumes in the subsurface, these methods lack principled approaches to characterize uncertainties related to the CO2 plumes' behavior. Inclusion of systematic assessment of uncertainties is essential for risk mitigation for the following reasons: (i) CO2 plume-induced changes are small and seismic data is noisy; (ii) changes between regular and irregular (e.g., caused by leakage) flow patterns are small; and (iii) the reservoir properties that control the flow are strongly heterogeneous and typically only available as distributions. To arrive at a formulation capable of inferring flow patterns for regular and irregular flow from well and seismic data, the performance of conditional normalizing flow will be analyzed on a series of carefully designed numerical experiments. While the inferences presented are preliminary in the context of an early CO2 leakage detection system, the results do indicate that inferences with conditional normalizing flows can produce high-fidelity estimates for CO2 plumes with or without leakage. We are also confident that the inferred uncertainty is reasonable because it correlates well with the observed errors. This uncertainty stems from noise in the seismic data and from the lack of precise knowledge of the reservoir's fluid flow properties.
△ Less
Submitted 28 November, 2023; v1 submitted 1 November, 2023;
originally announced November 2023.
-
Imperfect Digital Twin Assisted Low Cost Reinforcement Training for Multi-UAV Networks
Authors:
Xiucheng Wang,
Nan Cheng,
Longfei Ma,
Zhisheng Yin,
Tom. Luan,
Ning Lu
Abstract:
Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algor…
▽ More
Deep Reinforcement Learning (DRL) is widely used to optimize the performance of multi-UAV networks. However, the training of DRL relies on the frequent interactions between the UAVs and the environment, which consumes lots of energy due to the flying and communication of UAVs in practical experiments. Inspired by the growing digital twin (DT) technology, which can simulate the performance of algorithms in the digital space constructed by co** features of the physical space, the DT is introduced to reduce the costs of practical training, e.g., energy and hardware purchases. Different from previous DT-assisted works with an assumption of perfect reflecting real physics by virtual digital, we consider an imperfect DT model with deviations for assisting the training of multi-UAV networks. Remarkably, to trade off the training cost, DT construction cost, and the impact of deviations of DT on training, the natural and virtually generated UAV mixing deployment method is proposed. Two cascade neural networks (NN) are used to optimize the joint number of virtually generated UAVs, the DT construction cost, and the performance of multi-UAV networks. These two NNs are trained by unsupervised and reinforcement learning, both low-cost label-free training methods. Simulation results show the training cost can significantly decrease while guaranteeing the training performance. This implies that an efficient decision can be made with imperfect DTs in multi-UAV networks.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Experimental test of the Jarzynski equality in a single spin-1 system using high-fidelity single-shot readouts
Authors:
Wenquan Liu,
Zhibo Niu,
Wei Cheng,
Xin Li,
Chang-Kui Duan,
Zhangqi Yin,
Xing Rong,
Jiangfeng Du
Abstract:
The Jarzynski equality (JE), which connects the equilibrium free energy with non-equilibrium work statistics, plays a crucial role in quantum thermodynamics. Although practical quantum systems are usually multi-level systems, most tests of the JE were executed in two-level systems. A rigorous test of the JE by directly measuring the work distribution of a physical process in a high-dimensional qua…
▽ More
The Jarzynski equality (JE), which connects the equilibrium free energy with non-equilibrium work statistics, plays a crucial role in quantum thermodynamics. Although practical quantum systems are usually multi-level systems, most tests of the JE were executed in two-level systems. A rigorous test of the JE by directly measuring the work distribution of a physical process in a high-dimensional quantum system remains elusive. Here, we report an experimental test of the JE in a single spin-1 system. We realized nondemolition projective measurement of this three-level system via cascading high-fidelity single-shot readouts and directly measured the work distribution utilizing the two-point measurement protocol. The validity of the JE was verified from the non-adiabatic to adiabatic zone and under different effective temperatures. Our work puts the JE on a solid experimental foundation and makes the NV center system a mature toolbox to perform advanced experiments of stochastic quantum thermodynamics.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Ill-posedness for the Cauchy problem of the modified Camassa-Holm equation in $B^0_{\infty,1}$
Authors:
Zhen He,
Zhaoyang Yin
Abstract:
In this paper, we prove the norm inflation and get the ill-posedness for the modified Camassa-Holm equation in $B_{\infty,1}^0$. Therefore we completed all well-posedness and ill-posedness problem for the modified Camassa-Holm equation in all critical spaces $B_{p,1}^\frac{1}{p}$ with $p\in[1,\infty]$.
In this paper, we prove the norm inflation and get the ill-posedness for the modified Camassa-Holm equation in $B_{\infty,1}^0$. Therefore we completed all well-posedness and ill-posedness problem for the modified Camassa-Holm equation in all critical spaces $B_{p,1}^\frac{1}{p}$ with $p\in[1,\infty]$.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Uniqueness of conservative solutions to the the modified Camassa-Holm equation via Characteristics
Authors:
Zhen He,
Zhaoyang Yin
Abstract:
In this paper,for a given conservative solution, we introduce a set of auxiliary variables tailored to this particular solution, and prove that these variables satisfy a particular semilinear system having unique solutions. In turn, we get the uniqueness of the conservative solution in the original variables.
In this paper,for a given conservative solution, we introduce a set of auxiliary variables tailored to this particular solution, and prove that these variables satisfy a particular semilinear system having unique solutions. In turn, we get the uniqueness of the conservative solution in the original variables.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Is ChatGPT a game changer for geocoding -- a benchmark for geocoding address parsing techniques
Authors:
Zhengcong Yin,
Diya Li,
Daniel W. Goldberg
Abstract:
The remarkable success of GPT models across various tasks, including toponymy recognition motivates us to assess the performance of the GPT-3 model in the geocoding address parsing task. To ensure that the evaluation more accurately mirrors performance in real-world scenarios with diverse user input qualities and resolve the pressing need for a 'gold standard' evaluation dataset for geocoding syst…
▽ More
The remarkable success of GPT models across various tasks, including toponymy recognition motivates us to assess the performance of the GPT-3 model in the geocoding address parsing task. To ensure that the evaluation more accurately mirrors performance in real-world scenarios with diverse user input qualities and resolve the pressing need for a 'gold standard' evaluation dataset for geocoding systems, we introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production. This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C.; and consists of three subsets to be used as training, validation, and testing sets. Building on this, we train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models. The evaluation results indicate that Bidirectional LSTM-CRF model has achieved the best performance over these transformer-based models and GPT-3 model. Transformer-based models demonstrate very comparable results compared to the Bidirectional LSTM-CRF model. The GPT-3 model, though trailing in performance, showcases potential in the address parsing task with few-shot examples, exhibiting room for improvement with additional fine-tuning. We open source the code and data of this presented benchmark so that researchers can utilize it for future model development or extend it to evaluate similar tasks, such as document geocoding.
△ Less
Submitted 15 December, 2023; v1 submitted 22 October, 2023;
originally announced October 2023.
-
Room temperature nonvolatile optical control of polar order in a charge density wave
Authors:
QM Liu,
Dong Wu,
TY Wu,
SS Han,
YR Peng,
ZH Yuan,
YH Cheng,
BH Li,
TC Hu,
Li Yue,
SX Xu,
RX Ding,
Ming Lu,
RS Li,
SJ Zhang,
BQ Lv,
Alfred Zong,
YF Su,
Nuh Gedik,
ZP Yin,
Tao Dong,
NL Wang
Abstract:
Utilizing ultrafast light-matter interaction to manipulate electronic states of quantum materials is an emerging area of research in condensed matter physics. It has significant implications for the development of ultrafast electronic devices of the future. However, the ability to induce long-lasting metastable electronic states yet in a fully reversible manner is a long standing challenge. Here,…
▽ More
Utilizing ultrafast light-matter interaction to manipulate electronic states of quantum materials is an emerging area of research in condensed matter physics. It has significant implications for the development of ultrafast electronic devices of the future. However, the ability to induce long-lasting metastable electronic states yet in a fully reversible manner is a long standing challenge. Here, by using ultrafast laser excitation with distinct pulse sequences and fluences, we were able to regulate the symmetry and electronic properties in a polar charge-density-wave material EuTe4. We demonstrated the capability of nonvolatile writing and erasing the polar order, a process that is completely reversible and is achieved at room temperature in an all optical manner. Each induced state brings about modifications to the electric resistance and second harmonic generation intensity. The results point to a distinct dynamical symmetry inversion mechanism in which photoexcitation mediates the polar phases of long-range electronic order. Our finding extends the scope of nonvolatile all-optical control of electronic states to ambient conditions, thus providing possibilities for applications in ultrafast optoelectronics.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Hierarchical Pretraining on Multimodal Electronic Health Records
Authors:
Xiaochen Wang,
Junyu Luo,
Jiaqi Wang,
Ziyi Yin,
Suhan Cui,
Yuan Zhong,
Yaqing Wang,
Fenglong Ma
Abstract:
Pretraining has proven to be a powerful technique in natural language processing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on electronic health records (EHR) fail to capture the hierarchical nature of EHR data, limiting their generalization capability across diverse downstream tasks using a single pretrained mod…
▽ More
Pretraining has proven to be a powerful technique in natural language processing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on electronic health records (EHR) fail to capture the hierarchical nature of EHR data, limiting their generalization capability across diverse downstream tasks using a single pretrained model. To tackle this challenge, this paper introduces a novel, general, and unified pretraining framework called MEDHMP, specifically designed for hierarchically multimodal EHR data. The effectiveness of the proposed MEDHMP is demonstrated through experimental results on eight downstream tasks spanning three levels. Comparisons against eighteen baselines further highlight the efficacy of our approach.
△ Less
Submitted 20 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Imitation Learning from Observation with Automatic Discount Scheduling
Authors:
Yuyang Liu,
Weijun Dong,
Yingdong Hu,
Chuan Wen,
Zhao-Heng Yin,
Chongjie Zhang,
Yang Gao
Abstract:
Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinfor…
▽ More
Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.
△ Less
Submitted 7 February, 2024; v1 submitted 11 October, 2023;
originally announced October 2023.
-
VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models
Authors:
Ziyi Yin,
Muchao Ye,
Tianrong Zhang,
Tianyu Du,
**guo Zhu,
Han Liu,
**ghui Chen,
Ting Wang,
Fenglong Ma
Abstract:
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations u…
▽ More
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks. Towards this end, we propose VLATTACK to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels. At the single-modal level, we propose a new block-wise similarity attack (BSA) strategy to learn image perturbations for disrupting universal representations. Besides, we adopt an existing text attack strategy to generate text perturbations independent of the image-modal attack. At the multimodal level, we design a novel iterative cross-search attack (ICSA) method to update adversarial image-text pairs periodically, starting with the outputs from the single-modal level. We conduct extensive experiments to attack five widely-used VL pre-trained models for six tasks. Experimental results show that VLATTACK achieves the highest attack success rates on all tasks compared with state-of-the-art baselines, which reveals a blind spot in the deployment of pre-trained VL models. Source codes can be found at https://github.com/ericyinyzy/VLAttack.
△ Less
Submitted 5 February, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Variational principle of higher dimension weighted pressure for amenable group actions
Authors:
Zhengyu Yin,
Zubiao Xiao
Abstract:
Let $r\geq 2$ and $(X_i,G)$ $(i=1,\cdots,r)$ be topological dynamical systems with $G$ being an infinite discrete amenable group. Suppose that $π_i:(X_i,G)\to (X_{i+1},G)$ are factor maps and $0\leq w_i\leq 1$. In this article, for $f\in C(X_1)$, we introduce the weighted topological pressure $P^{\textbf{a}}(f,G)$ for higher dimensions (not only for $r=2$) of amenable group actions. By using measu…
▽ More
Let $r\geq 2$ and $(X_i,G)$ $(i=1,\cdots,r)$ be topological dynamical systems with $G$ being an infinite discrete amenable group. Suppose that $π_i:(X_i,G)\to (X_{i+1},G)$ are factor maps and $0\leq w_i\leq 1$. In this article, for $f\in C(X_1)$, we introduce the weighted topological pressure $P^{\textbf{a}}(f,G)$ for higher dimensions (not only for $r=2$) of amenable group actions. By using measure-theoretical theory, we establish a variational principle as \begin{align*}
P^{\textbf{a}}(f,G)=\sup_{μ\in \mathcal{M}^G(X_1)}\Big(\sum_{i=1}^rw_ih_{μ_i}(X_i,G)+w_1\int_{X_1}fdμ\Big), \end{align*} where $μ_i=π_{i-1}\circ\cdots\circπ_{1}μ$ is the induced $G$-invariant measure on $X_{i}$.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
CUPre: Cross-domain Unsupervised Pre-training for Few-Shot Cell Segmentation
Authors:
Weibin Liao,
Xuhong Li,
Qingzhong Wang,
Yanwu Xu,
Zhaozheng Yin,
Haoyi Xiong
Abstract:
While pre-training on object detection tasks, such as Common Objects in Contexts (COCO) [1], could significantly boost the performance of cell segmentation, it still consumes on massive fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image, to fine-tune the pre-trained model. To lower the cost of annotation, this work considers the problem of pre-t…
▽ More
While pre-training on object detection tasks, such as Common Objects in Contexts (COCO) [1], could significantly boost the performance of cell segmentation, it still consumes on massive fine-annotated cell images [2] with bounding boxes, masks, and cell types for every cell in every image, to fine-tune the pre-trained model. To lower the cost of annotation, this work considers the problem of pre-training DNN models for few-shot cell segmentation, where massive unlabeled cell images are available but only a small proportion is annotated. Hereby, we propose Cross-domain Unsupervised Pre-training, namely CUPre, transferring the capability of object detection and instance segmentation for common visual objects (learned from COCO) to the visual domain of cells using unlabeled images. Given a standard COCO pre-trained network with backbone, neck, and head modules, CUPre adopts an alternate multi-task pre-training (AMT2) procedure with two sub-tasks -- in every iteration of pre-training, AMT2 first trains the backbone with cell images from multiple cell datasets via unsupervised momentum contrastive learning (MoCo) [3], and then trains the whole model with vanilla COCO datasets via instance segmentation. After pre-training, CUPre fine-tunes the whole model on the cell segmentation task using a few annotated images. We carry out extensive experiments to evaluate CUPre using LIVECell [2] and BBBC038 [4] datasets in few-shot instance segmentation settings. The experiment shows that CUPre can outperform existing pre-training methods, achieving the highest average precision (AP) for few-shot cell segmentation and detection.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
SCVCNet: Sliding cross-vector convolution network for cross-task and inter-individual-set EEG-based cognitive workload recognition
Authors:
Qi Wang,
Li Chen,
Zhiyuan Zhan,
Jianhua Zhang,
Zhong Yin
Abstract:
This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. Th…
▽ More
This paper presents a generic approach for applying the cognitive workload recognizer by exploiting common electroencephalogram (EEG) patterns across different human-machine tasks and individual sets. We propose a neural network called SCVCNet, which eliminates task- and individual-set-related interferences in EEGs by analyzing finer-grained frequency structures in the power spectral densities. The SCVCNet utilizes a sliding cross-vector convolution (SCVC) operation, where paired input layers representing the theta and alpha power are employed. By extracting the weights from a kernel matrix's central row and column, we compute the weighted sum of the two vectors around a specified scalp location. Next, we introduce an inter-frequency-point feature integration module to fuse the SCVC feature maps. Finally, we combined the two modules with the output-channel pooling and classification layers to construct the model. To train the SCVCNet, we employ the regularized least-square method with ridge regression and the extreme learning machine theory. We validate its performance using three databases, each consisting of distinct tasks performed by independent participant groups. The average accuracy (0.6813 and 0.6229) and F1 score (0.6743 and 0.6076) achieved in two different validation paradigms show partially higher performance than the previous works. All features and algorithms are available on website:https://github.com/7ohnKeats/SCVCNet.
△ Less
Submitted 21 September, 2023;
originally announced October 2023.
-
Evaluating Hallucinations in Chinese Large Language Models
Authors:
Qinyuan Cheng,
Tianxiang Sun,
Wenwei Zhang,
Siyin Wang,
Xiangyang Liu,
Mozhi Zhang,
Junliang He,
Mianqiu Huang,
Zhangyue Yin,
Kai Chen,
Xipeng Qiu
Abstract:
In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two ty…
▽ More
In this paper, we establish a benchmark named HalluQA (Chinese Hallucination Question-Answering) to measure the hallucination phenomenon in Chinese large language models. HalluQA contains 450 meticulously designed adversarial questions, spanning multiple domains, and takes into account Chinese historical culture, customs, and social phenomena. During the construction of HalluQA, we consider two types of hallucinations: imitative falsehoods and factual errors, and we construct adversarial samples based on GLM-130B and ChatGPT. For evaluation, we design an automated evaluation method using GPT-4 to judge whether a model output is hallucinated. We conduct extensive experiments on 24 large language models, including ERNIE-Bot, Baichuan2, ChatGLM, Qwen, SparkDesk and etc. Out of the 24 models, 18 achieved non-hallucination rates lower than 50%. This indicates that HalluQA is highly challenging. We analyze the primary types of hallucinations in different types of models and their causes. Additionally, we discuss which types of hallucinations should be prioritized for different types of models.
△ Less
Submitted 25 October, 2023; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Dynamic Shuffle: An Efficient Channel Mixture Method
Authors:
Kaijun Gong,
Zhuowen Yin,
Yushu Li,
Kailing Guo,
Xiangmin Xu
Abstract:
The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is…
▽ More
The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is proportional to the square of the number of input channels, to make the generation process efficiently, we divide the channels into groups and generate two shared small permutation matrices for each group, and utilize Kronecker product and cross group shuffle to obtain the final permutation matrices. To make the generation process learnable, based on theoretical analysis, softmax, orthogonal regularization, and binarization are employed to asymptotically approximate the permutation matrix. Dynamic shuffle adaptively mixes channel information with negligible extra computation and memory occupancy. Experiment results on image classification benchmark datasets CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet have shown that our method significantly increases ShuffleNets' performance. Adding dynamic generated matrix with learnable static matrix, we further propose static-dynamic-shuffle and show that it can serve as a lightweight replacement of ordinary pointwise convolution.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation
Authors:
Yuan Zhong,
Suhan Cui,
Jiaqi Wang,
Xiaochen Wang,
Ziyi Yin,
Yaqing Wang,
Mengdi Huai,
Ting Wang,
Fenglong Ma
Abstract:
Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality…
▽ More
Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality, and inherent noise. These models have yielded impressive results. Nonetheless, a key issue undermining their effectiveness is data insufficiency. A variety of data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through the learning of underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability.
△ Less
Submitted 5 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration
Authors:
Qiushi Sun,
Zhangyue Yin,
Xiang Li,
Zhiyong Wu,
Xipeng Qiu,
Lingpeng Kong
Abstract:
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge. Benefiting from ultra-large-scale training corpora, a single LLM can manage typical NLP tasks competently. However, its performance in executing reasoning tasks is still confined by the limitations of its internal repr…
▽ More
Large Language Models (LLMs) are evolving at an unprecedented pace and have exhibited considerable capability in the realm of natural language processing (NLP) with world knowledge. Benefiting from ultra-large-scale training corpora, a single LLM can manage typical NLP tasks competently. However, its performance in executing reasoning tasks is still confined by the limitations of its internal representations. To push this boundary further, we introduce Corex in this paper, a suite of novel general-purpose strategies that transform LLMs into autonomous agents pioneering multi-model collaborations for complex task-solving. Inspired by human behaviors, Corex is constituted by diverse collaboration paradigms including Debate, Review, and Retrieve modes, which collectively work towards enhancing the factuality, faithfulness, and reliability of the reasoning process. These paradigms foster task-agnostic approaches that enable LLMs to ''think outside the box,'' thereby overcoming hallucinations and providing better solutions. Through extensive experiments across four different types of reasoning tasks, we demonstrate that orchestrating multiple LLMs to work in concert yields substantially better performance compared to existing methods. Further results and in-depth analysis demonstrate the cost-effectiveness of our method, facilitating collaboration among different LLMs and promoting annotation efficiency.
△ Less
Submitted 7 April, 2024; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Exploring Interacting Dark Energy with Chaos Quantum-Behaved Particle Swarm Optimization
Authors:
Zhixiang Yin,
Zelin Ren,
André A. Costa
Abstract:
Models with an interaction between dark energy and dark matter have already been studied for about twenty years. However, in this paper, we provide for the first time a general analytical solution for models with an energy transfer given by $\mathcal{E} = 3H(ξ_1 ρ_c + ξ_2 ρ_d)$. We also use a new set of age-redshift data for 114 old astrophysical objects (OAO) and constrain some special cases of t…
▽ More
Models with an interaction between dark energy and dark matter have already been studied for about twenty years. However, in this paper, we provide for the first time a general analytical solution for models with an energy transfer given by $\mathcal{E} = 3H(ξ_1 ρ_c + ξ_2 ρ_d)$. We also use a new set of age-redshift data for 114 old astrophysical objects (OAO) and constrain some special cases of this general energy transfer. We use a method inspired on artificial intelligence, known as Chaos Quantum-behaved Particle Swarm Optimization (CQPSO), to explore the parameter space and search the best fit values. We test this method under a simulated scenario and also compare with previous MCMC results and find good agreement with the expected results.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
New Revival Phenomena for Bidirectional Dispersive Hyperbolic Equations
Authors:
George Farmakis,
**g Kang,
Peter J. Olver,
Changzheng Qu,
Zihan Yin
Abstract:
In this paper, the dispersive revival and fractalization phenomena for bidirectional dispersive equations on a bounded interval subject to periodic boundary conditions and discontinuous initial profiles are investigated. Firstly, we study the periodic initial-boundary value problem of the linear beam equation with step function initial data, and analyze the manifestation of the revival phenomenon…
▽ More
In this paper, the dispersive revival and fractalization phenomena for bidirectional dispersive equations on a bounded interval subject to periodic boundary conditions and discontinuous initial profiles are investigated. Firstly, we study the periodic initial-boundary value problem of the linear beam equation with step function initial data, and analyze the manifestation of the revival phenomenon for the corresponding solution at rational times. Next, we extend the investigation to periodic initial-boundary value problems of more general bidirectional dispersive equations. We prove that, if the initial functions are of bounded variation, the dynamical evolution of such periodic problems depend essentially upon the large wave number asymptotics of the associated dispersion relations. Integral polynomial or asymptotically integral polynomial dispersion relations produce dispersive revival/fractalization rational/irrational dichotomies, whereas those with non-polynomial growth result in fractal profiles at all times. Finally, numerical experiments, in the concrete case of the nonlinear beam equation, are used to demonstrate how such effects persist into the nonlinear regime.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
FGFusion: Fine-Grained Lidar-Camera Fusion for 3D Object Detection
Authors:
Zixuan Yin,
Han Sun,
Ningzhong Liu,
Huiyu Zhou,
Jiaquan Shen
Abstract:
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While most prevalent methods progressively downscale the 3D point clouds and camera images and then fuse the high-level features, the downscaled features inevitably lose low-level detailed information. In this paper, we propose Fine-Grained Lidar-Camera Fusion (FGFusion) that make…
▽ More
Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While most prevalent methods progressively downscale the 3D point clouds and camera images and then fuse the high-level features, the downscaled features inevitably lose low-level detailed information. In this paper, we propose Fine-Grained Lidar-Camera Fusion (FGFusion) that make full use of multi-scale features of image and point cloud and fuse them in a fine-grained way. First, we design a dual pathway hierarchy structure to extract both high-level semantic and low-level detailed features of the image. Second, an auxiliary network is introduced to guide point cloud features to better learn the fine-grained spatial information. Finally, we propose multi-scale fusion (MSF) to fuse the last N feature maps of image and point cloud. Extensive experiments on two popular autonomous driving benchmarks, i.e. KITTI and Waymo, demonstrate the effectiveness of our method.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
MetaF2N: Blind Image Super-Resolution by Learning Efficient Model Adaptation from Faces
Authors:
Zhicun Yin,
Ming Liu,
Xiaoming Li,
Hui Yang,
Longan Xiao,
Wangmeng Zuo
Abstract:
Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-…
▽ More
Due to their highly structured characteristics, faces are easier to recover than natural scenes for blind image super-resolution. Therefore, we can extract the degradation representation of an image from the low-quality and recovered face pairs. Using the degradation representation, realistic low-quality images can then be synthesized to fine-tune the super-resolution model for the real-world low-quality image. However, such a procedure is time-consuming and laborious, and the gaps between recovered faces and the ground-truths further increase the optimization uncertainty. To facilitate efficient model adaptation towards image-specific degradations, we propose a method dubbed MetaF2N, which leverages the contained Faces to fine-tune model parameters for adapting to the whole Natural image in a Meta-learning framework. The degradation extraction and low-quality image synthesis steps are thus circumvented in our MetaF2N, and it requires only one fine-tuning step to get decent performance. Considering the gaps between the recovered faces and ground-truths, we further deploy a MaskNet for adaptively predicting loss weights at different positions to reduce the impact of low-confidence areas. To evaluate our proposed MetaF2N, we have collected a real-world low-quality dataset with one or multiple faces in each image, and our MetaF2N achieves superior performance on both synthetic and real-world datasets. Source code, pre-trained models, and collected datasets are available at https://github.com/yinzhicun/MetaF2N.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
GelSplitter: Tactile Reconstruction from Near Infrared and Visible Images
Authors:
Yuankai Lin,
Yulin Zhou,
Kaiji Huang,
Qi Zhong,
Tao Cheng,
Hua Yang,
Zhou** Yin
Abstract:
The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor w…
▽ More
The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor with synchronized multi-modal cameras and resemble a more human-like tactile receptor. Furthermore, we focus on 3D tactile reconstruction and implement a compact sensor structure that maintains a comparable size to state-of-the-art VT sensors, even with the addition of a prism and a near infrared (NIR) camera. We also design a photometric fusion stereo neural network (PFSNN), which estimates surface normals of objects and reconstructs touch geometry from both infrared and visible images. Our results demonstrate that the accuracy of RGB and NIR fusion is higher than that of RGB images alone. Additionally, our GelSplitter framework allows for a flexible configuration of different camera sensor combinations, such as RGB and thermal imaging.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
Authors:
Zhiheng Xi,
Wenxiang Chen,
Xin Guo,
Wei He,
Yiwen Ding,
Boyang Hong,
Ming Zhang,
Junzhe Wang,
Senjie **,
Enyu Zhou,
Rui Zheng,
Xiaoran Fan,
Xiao Wang,
Limao Xiong,
Yuhao Zhou,
Weiran Wang,
Changhao Jiang,
Yicheng Zou,
Xiangyang Liu,
Zhangyue Yin,
Shihan Dou,
Rongxiang Weng,
Wensen Cheng,
Qi Zhang,
Wenjuan Qin
, et al. (4 additional authors not shown)
Abstract:
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training stra…
▽ More
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.
△ Less
Submitted 19 September, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Fully passive Measurement Device Independent Quantum Key Distribution
Authors:
Xiang Wang,
Feng-Yu Lu,
Ze-Hao Wang,
Zhen-Qiang Yin,
Shuang Wang,
Wei Chen,
De-Yong He,
Guang-Can Guo,
Zheng-Fu Han
Abstract:
Measurement-device-independent quantum key distribution (MDI-QKD) can resist all attacks on the detection devices, but there are still some security issues related to the source side. One possible solution is to use the passive protocol to eliminate the side channels introduced by active modulators at the source. Recently, a fully passive QKD protocol has been proposed that can simultaneously achi…
▽ More
Measurement-device-independent quantum key distribution (MDI-QKD) can resist all attacks on the detection devices, but there are still some security issues related to the source side. One possible solution is to use the passive protocol to eliminate the side channels introduced by active modulators at the source. Recently, a fully passive QKD protocol has been proposed that can simultaneously achieve passive encoding and passive decoy-state modulation using linear optics. In this work, we propose a fully passive MDI-QKD scheme that can protect the system from both side channels of source modulators and attacks on the measurement devices, which can significantly improve the implementation security of the QKD systems. We provide a specific passive encoding strategy and a method for decoy-state analysis, followed by simulation results for the secure key rate in the asymptotic scenario. Our work offers a feasible way to improve the implementation security of QKD systems, and serves as a reference for achieving passive QKD schemes using realistic devices.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Authors:
Yunshui Li,
Binyuan Hui,
Zhaochao Yin,
Wanwei He,
Run Luo,
Yuxing Long,
Min Yang,
Fei Huang,
Yongbin Li
Abstract:
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for…
▽ More
Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It defines five core multi-modal dialogue tasks and covers six datasets. Furthermore, in order to provide a comprehensive assessment of the model's performance across all tasks, we developed a novel evaluation metric called VDscore, which is based on the Analytic Hierarchy Process~(AHP) method. Additionally, we present a straightforward yet efficient baseline model, named \textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog \textbf{T}ransformer), to promote the advancement of general multi-modal dialogue systems. It progressively builds its multi-modal foundation and dialogue capability via a two-stage pre-training strategy.
We believe that the VDialogUE benchmark, along with the evaluation scripts and our baseline models, will accelerate the development of visually-grounded dialog systems and lead to the development of more sophisticated and effective pre-trained models.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
GelFlow: Self-supervised Learning of Optical Flow for Vision-Based Tactile Sensor Displacement Measurement
Authors:
Zhiyuan Zhang,
Hua Yang,
Zhou** Yin
Abstract:
High-resolution multi-modality information acquired by vision-based tactile sensors can support more dexterous manipulations for robot fingers. Optical flow is low-level information directly obtained by vision-based tactile sensors, which can be transformed into other modalities like force, geometry and depth. Current vision-tactile sensors employ optical flow methods from OpenCV to estimate the d…
▽ More
High-resolution multi-modality information acquired by vision-based tactile sensors can support more dexterous manipulations for robot fingers. Optical flow is low-level information directly obtained by vision-based tactile sensors, which can be transformed into other modalities like force, geometry and depth. Current vision-tactile sensors employ optical flow methods from OpenCV to estimate the deformation of markers in gels. However, these methods need to be more precise for accurately measuring the displacement of markers during large elastic deformation of the gel, as this can significantly impact the accuracy of downstream tasks. This study proposes a self-supervised optical flow method based on deep learning to achieve high accuracy in displacement measurement for vision-based tactile sensors. The proposed method employs a coarse-to-fine strategy to handle large deformations by constructing a multi-scale feature pyramid from the input image. To better deal with the elastic deformation caused by the gel, the Helmholtz velocity decomposition constraint combined with the elastic deformation constraint are adopted to address the distortion rate and area change rate, respectively. A local flow fusion module is designed to smooth the optical flow, taking into account the prior knowledge of the blurred effect of gel deformation. We trained the proposed self-supervised network using an open-source dataset and compared it with traditional and deep learning-based optical flow methods. The results show that the proposed method achieved the highest displacement measurement accuracy, thereby demonstrating its potential for enabling more precise measurement of downstream tasks using vision-based tactile sensors.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Quantum Diamond Microscope for Dynamic Imaging of Magnetic Fields
Authors:
Jiashen Tang,
Zechuan Yin,
Connor A. Hart,
John W. Blanchard,
Jner Tzern Oon,
Smriti Bhalerao,
Jennifer M. Schloss,
Matthew J. Turner,
Ronald L. Walsworth
Abstract:
Wide-field imaging of magnetic signals using ensembles of nitrogen-vacancy (NV) centers in diamond has garnered increasing interest due to its combination of micron-scale resolution, millimeter-scale field of view, and compatibility with diverse samples from across the physical and life sciences. Recently, wide-field NV magnetic imaging based on the Ramsey protocol has achieved uniform and enhance…
▽ More
Wide-field imaging of magnetic signals using ensembles of nitrogen-vacancy (NV) centers in diamond has garnered increasing interest due to its combination of micron-scale resolution, millimeter-scale field of view, and compatibility with diverse samples from across the physical and life sciences. Recently, wide-field NV magnetic imaging based on the Ramsey protocol has achieved uniform and enhanced sensitivity compared to conventional measurements. Here, we integrate the Ramsey-based protocol with spin-bath driving to extend the NV spin dephasing time and improve magnetic sensitivity. We also employ a high-speed camera to enable dynamic wide-field magnetic imaging. We benchmark the utility of this quantum diamond microscope (QDM) by imaging magnetic fields produced from a fabricated wire phantom. Over a $270\times270 \hspace{0.08333em} μ\mathrm{m}$$^2$ field of view, a median per-pixel magnetic sensitivity of $4.1(1)\hspace{0.08333em}\mathrm{nT}$$/\sqrt{\mathrm{Hz}}$ is realized with a spatial resolution $\lesssim\hspace{0.08333em}10\hspace{0.08333em}μ\mathrm{m}$ and sub-millisecond temporal resolution. Importantly, the spatial magnetic noise floor can be reduced to the picotesla scale by time-averaging and signal modulation, which enables imaging of a magnetic-field pattern with a peak-to-peak amplitude difference of about $300\hspace{0.08333em}\mathrm{pT}$. Finally, we discuss potential new applications of this dynamic QDM in studying biomineralization and electrically-active cells.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
O2ATH: An OpenMP Offloading Toolkit for the Sunway Heterogeneous Manycore Platform
Authors:
Haoran Lin,
Lifeng Yan,
Qixin Chang,
Haitian Lu,
Chenlin Li,
Quanjie He,
Zeyu Song,
Xiaohui Duan,
Zekun Yin,
Yuxuan Li,
Zhao Liu,
Wei Xue,
Haohuan Fu,
Lin Gan,
Guangwen Yang,
Weiguo Liu
Abstract:
The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity improvement of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. However, some legacy projects with large codebases, suc…
▽ More
The next generation Sunway supercomputer employs the SW26010pro processor, which features a specialized on-chip heterogeneous architecture. Applications with significant hotspots can benefit from the great computation capacity improvement of Sunway many-core architectures by carefully making intensive manual many-core parallelization efforts. However, some legacy projects with large codebases, such as CESM, ROMS and WRF, contain numerous lines of code and do not have significant hotspots. The cost of manually porting such applications to the Sunway architecture is almost unaffordable. To overcome such a challenge, we have developed a toolkit named O2ATH. O2ATH forwards GNU OpenMP runtime library calls to Sunway's Athread library, which greatly simplifies the parallelization work on the Sunway architecture.O2ATH enables users to write both MPE and CPE code in a single file, and parallelization can be achieved by utilizing OpenMP directives and attributes. In practice, O2ATH has helped us to port two large projects, CESM and ROMS, to the CPEs of the next generation Sunway supercomputers via the OpenMP offload method. In the experiments, kernel speedups range from 3 to 15 times, resulting in 3 to 6 times whole application speedups.Furthermore, O2ATH requires significantly fewer code modifications compared to manually crafting CPE functions.This indicates that O2ATH can greatly enhance development efficiency when porting or optimizing large software projects on Sunway supercomputers.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
A 9 Transistor SRAM Featuring Array-level XOR Parallelism with Secure Data Toggling Operation
Authors:
Zihan Yin,
Annewsha Datta,
Shwetha Vijayakumar,
Ajey Jacob,
Akhilesh Jaiswal
Abstract:
Security and energy-efficiency are critical for computing applications in general and for edge applications in particular. Digital in-Memory Computing (IMC) in SRAM cells have widely been studied to accelerate inference tasks to maximize both throughput and energy efficiency for intelligent computing at the edge. XOR operations have been of particular interest due to their wide applicability in nu…
▽ More
Security and energy-efficiency are critical for computing applications in general and for edge applications in particular. Digital in-Memory Computing (IMC) in SRAM cells have widely been studied to accelerate inference tasks to maximize both throughput and energy efficiency for intelligent computing at the edge. XOR operations have been of particular interest due to their wide applicability in numerous applications that include binary neural networks and encryption. However, existing IMC circuits for XOR acceleration are limited to two rows in a memory array and extending the XOR parallelism to multiple rows in an SRAM array has remained elusive. Further, SRAM is prone to both data imprinting and data remanence security issues, which poses limitations on security . Based on commerical Globalfoundries 22nm mode, we are proposing a novel 9T SRAM cell such that multiple rows of data (entire array) can be XORed in a massively parallel single cycle fashion. The new cell also supports data-toggling within the SRAM cell efficiently to circumvent imprinting attacks and erase the SRAM value in case of remanence attack.
△ Less
Submitted 11 August, 2023;
originally announced September 2023.
-
Learning From Peers: A Survey of Perception and Utilization of Online Peer Support Among Informal Dementia Caregivers
Authors:
Zhijun Yin,
Lauren Stratton,
Qingyuan Song,
Congning Ni,
Lijun Song,
Patricia A. Commiskey,
Qingxia Chen,
Monica Moreno,
Sam Fazio,
Bradley A. Malin
Abstract:
Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers who…
▽ More
Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers whom they generally do not know in real life. Research has demonstrated the benefits of peer support in online communities, but they are limited in focusing merely on caregivers who are already online users. In this paper, we designed and administered a survey to investigate the perception and utilization of online peer support from 140 informal dementia caregivers (with 100 online-community caregivers). Our findings show that the behavior to access any online community is only significantly associated with their belief in the value of online peer support (p = 0.006). Moreover, 33 (83%) of the 40 non-online-community caregivers had a belief score above 24, a score assigned when a neutral option is selected for each belief question. The reasons most articulated for not accessing any online community were no time to do so (14; 10%), and insufficient online information searching skills (9; 6%). Our findings suggest that online peer support is valuable, but practical strategies are needed to assist informal dementia caregivers who have limited time or searching skills.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Spectrum of Laplacian matrices associated with large random elliptic matrices
Authors:
Sean O'Rourke,
Zhi Yin,
** Zhong
Abstract:
A Laplacian matrix is a square matrix whose row sums are zero. We study the limiting eigenvalue distribution of a Laplacian matrix formed by taking a random elliptic matrix and subtracting the diagonal matrix containing its row sums. Under some mild assumptions, we show that the empirical spectral distribution of the Laplacian matrix converges to a deterministic probability distribution as the siz…
▽ More
A Laplacian matrix is a square matrix whose row sums are zero. We study the limiting eigenvalue distribution of a Laplacian matrix formed by taking a random elliptic matrix and subtracting the diagonal matrix containing its row sums. Under some mild assumptions, we show that the empirical spectral distribution of the Laplacian matrix converges to a deterministic probability distribution as the size of the matrix tends to infinity. The limiting measure can be interpreted as the Brown measure of the sum of an elliptic operator and a freely independent normal operator with a Gaussian distribution.
△ Less
Submitted 15 December, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
Label-free Deep Learning Driven Secure Access Selection in Space-Air-Ground Integrated Networks
Authors:
Zhaowei Wang,
Zhisheng Yin,
Xiucheng Wang,
Nan Cheng,
Yuan Zhang,
Tom H. Luan
Abstract:
In Space-air-ground integrated networks (SAGIN), the inherent openness and extensive broadcast coverage expose these networks to significant eavesdrop** threats. Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions. However, it is challenging…
▽ More
In Space-air-ground integrated networks (SAGIN), the inherent openness and extensive broadcast coverage expose these networks to significant eavesdrop** threats. Considering the inherent co-channel interference due to spectrum sharing among multi-tier access networks in SAGIN, it can be leveraged to assist the physical layer security among heterogeneous transmissions. However, it is challenging to conduct a secrecy-oriented access strategy due to both heterogeneous resources and different eavesdrop** models. In this paper, we explore secure access selection for a scenario involving multi-mode users capable of accessing satellites, unmanned aerial vehicles, or base stations in the presence of eavesdroppers. Particularly, we propose a Q-network approximation based deep learning approach for selecting the optimal access strategy for maximizing the sum secrecy rate. Meanwhile, the power optimization is also carried out by an unsupervised learning approach to improve the secrecy performance. Remarkably, two neural networks are trained by unsupervised learning and Q-network approximation which are both label-free methods without knowing the optimal solution as labels. Numerical results verify the efficiency of our proposed power optimization approach and access strategy, leading to enhanced secure transmission performance.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
DARWIN Series: Domain Specific Large Language Models for Natural Science
Authors:
Tong Xie,
Yuwei Wan,
Wei Huang,
Zhenyu Yin,
Yixuan Liu,
Shaozhou Wang,
Qingyuan Linghu,
Chunyu Kit,
Clara Grazian,
Wenjie Zhang,
Imran Razzak,
Bram Hoex
Abstract:
Emerging tools bring forth fresh approaches to work, and the field of natural science is no different. In natural science, traditional manual, serial, and labour-intensive work is being augmented by automated, parallel, and iterative processes driven by artificial intelligence-based experimental automation and more. To add new capabilities in natural science, enabling the acceleration and enrichme…
▽ More
Emerging tools bring forth fresh approaches to work, and the field of natural science is no different. In natural science, traditional manual, serial, and labour-intensive work is being augmented by automated, parallel, and iterative processes driven by artificial intelligence-based experimental automation and more. To add new capabilities in natural science, enabling the acceleration and enrichment of automation of the discovery process, we present DARWIN, a series of tailored LLMs for natural science, mainly in physics, chemistry, and material science. This series relies on open-source LLM, incorporating structured and unstructured scientific knowledge from public datasets and literature. We fine-tuned the models using over 60,000 instruction data points, emphasizing factual correctness. During the fine-tuning, we introduce the Scientific Instruction Generation (SIG) model, automating instruction generation from scientific texts. This eliminates the need for manual extraction or domain-specific knowledge graphs and efficiently injects scientific knowledge into the model. We also explore multi-task training strategies, revealing interconnections between scientific tasks. DARWIN series not only achieves state-of-the-art results on various scientific tasks but also diminishes reliance on closed-source AI models. Our research showcases the ability of LLM in the scientific domain, with the overarching goal of fostering prosperity within the broader AI for science community.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network
Authors:
John Ho,
Zhao-Heng Yin,
Colin Zhang,
Nicole Guo,
Yang Ha
Abstract:
Predicting the solubility of given molecules remains crucial in the pharmaceutical industry. In this study, we revisited this extensively studied topic, leveraging the capabilities of contemporary computing resources. We employed two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets. Both methods yielded re…
▽ More
Predicting the solubility of given molecules remains crucial in the pharmaceutical industry. In this study, we revisited this extensively studied topic, leveraging the capabilities of contemporary computing resources. We employed two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets. Both methods yielded reasonable predictions, with the GCNN model exhibiting the highest level of performance. However, the present GCNN model has limited interpretability while the linear regression model allows scientists for a greater in-depth analysis of the underlying factors through feature importance analysis, although more human inputs and evaluations on the overall dataset is required. From the perspective of chemistry, using the linear regression model, we elucidated the impact of individual atom species and functional groups on overall solubility, highlighting the significance of comprehending how chemical structure influences chemical properties in the drug development process. It is learned that introducing oxygen atoms can increase the solubility of organic molecules, while almost all other hetero atoms except oxygen and nitrogen tend to decrease solubility.
△ Less
Submitted 4 January, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Adaptive White-Box Watermarking with Self-Mutual Check Parameters in Deep Neural Networks
Authors:
Zhenzhe Gao,
Zhaoxia Yin,
Hongjian Zhan,
Heng Yin,
Yue Lu
Abstract:
Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmiss…
▽ More
Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Split Learning for Distributed Collaborative Training of Deep Learning Models in Health Informatics
Authors:
Zhuohang Li,
Chao Yan,
Xinmeng Zhang,
Gharib Gharibi,
Zhijun Yin,
Xiaoqian Jiang,
Bradley A. Malin
Abstract:
Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning ca…
▽ More
Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning can enable collaborative training of deep learning models across disparate and privately maintained health datasets, while kee** the original records and model parameters private. We introduce a new privacy-preserving distributed learning framework that offers a higher level of privacy compared to conventional federated learning. We use several biomedical imaging and electronic health record (EHR) datasets to show that deep learning models trained via split learning can achieve highly similar performance to their centralized and federated counterparts while greatly improving computational efficiency and reducing privacy risks.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
The degree threshold for covering with all the connected $3$-graphs with $3$ edges
Authors:
Yue Ma,
Xinmin Hou,
Zhi Yin
Abstract:
Given two $r$-uniform hypergraphs $F$ and $H$, we say that $H$ has an $F$-covering if every vertex in $H$ is contained in a copy of $F$. Let $c_{i}(n,F)$ be the least integer such that every $n$-vertex $r$-graph $H$ with $δ_{i}(H)>c_i(n,F)$ has an $F$-covering. Falgas-Ravry, Markstöm and Zhao (Combin. Probab. Comput., 2021) asymptotically determined $c_1(n,K_{4}^{(3)-})$, where $K_{4}^{(3)-}$ is o…
▽ More
Given two $r$-uniform hypergraphs $F$ and $H$, we say that $H$ has an $F$-covering if every vertex in $H$ is contained in a copy of $F$. Let $c_{i}(n,F)$ be the least integer such that every $n$-vertex $r$-graph $H$ with $δ_{i}(H)>c_i(n,F)$ has an $F$-covering. Falgas-Ravry, Markstöm and Zhao (Combin. Probab. Comput., 2021) asymptotically determined $c_1(n,K_{4}^{(3)-})$, where $K_{4}^{(3)-}$ is obtained by deleting an edge from the complete $3$-graph on $4$ vertices. Later, Tang, Ma and Hou (arXiv, 2022) asymptotically determined $c_1(n,C_{6}^{(3)})$, where $C_{6}^{(3)}$ is the linear triangle, i.e. $C_{6}^{(3)}=([6],\{123,345,561\})$. In this paper, we determine $c_1(n,F_5)$ asymptotically, where $F_5$ is the generalized triangle, i.e. $F_5=([5],\{123,124,345\})$. We also determine the exact values of $c_1(n,F)$, where $F$ is any connected $3$-graphs with $3$ edges and $F\notin\{K_4^{(3)-}, C_{6}^{(3)}, F_5\}$.
△ Less
Submitted 19 August, 2023;
originally announced August 2023.
-
Persistence property and the local well-posedness of the modified Camassa-Holm equation in critical Besov equation
Authors:
Zhen He,
Zhaoyang Yin
Abstract:
In this paper, we first establish the local well-posednesss for the Cauchy problem of a modified Camassa-Holm (MOCH) equation in critical Besov spaces $B^{\frac 1 p}_{p,1}$ with $1\leq p<+\infty.$ The obtained results improve considerably the recent result in \cite{Luo1}. Then we show the persiscence property of MOCH.
In this paper, we first establish the local well-posednesss for the Cauchy problem of a modified Camassa-Holm (MOCH) equation in critical Besov spaces $B^{\frac 1 p}_{p,1}$ with $1\leq p<+\infty.$ The obtained results improve considerably the recent result in \cite{Luo1}. Then we show the persiscence property of MOCH.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance
Authors:
Haodong Chen,
Ming C. Leu,
Md Moniruzzaman,
Zhaozheng Yin,
Solmaz Hajmohammadi
Abstract:
Repetitive counting (RepCount) is critical in various applications, such as fitness tracking and rehabilitation. Previous methods have relied on the estimation of red-green-and-blue (RGB) frames and body pose landmarks to identify the number of action repetitions, but these methods suffer from a number of issues, including the inability to stably handle changes in camera viewpoints, over-counting,…
▽ More
Repetitive counting (RepCount) is critical in various applications, such as fitness tracking and rehabilitation. Previous methods have relied on the estimation of red-green-and-blue (RGB) frames and body pose landmarks to identify the number of action repetitions, but these methods suffer from a number of issues, including the inability to stably handle changes in camera viewpoints, over-counting, under-counting, difficulty in distinguishing between sub-actions, inaccuracy in recognizing salient poses, etc. In this paper, based on the work done by [1], we integrate joint angles with body pose landmarks to address these challenges and achieve better results than the state-of-the-art RepCount methods, with a Mean Absolute Error (MAE) of 0.211 and an Off-By-One (OBO) counting accuracy of 0.599 on the RepCount data set [2]. Comprehensive experimental results demonstrate the effectiveness and robustness of our method.
△ Less
Submitted 24 February, 2024; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Distilling Knowledge from Resource Management Algorithms to Neural Networks: A Unified Training Assistance Approach
Authors:
Longfei Ma,
Nan Cheng,
Xiucheng Wang,
Zhisheng Yin,
Haibo Zhou,
Wei Quan
Abstract:
As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance o…
▽ More
As a fundamental problem, numerous methods are dedicated to the optimization of signal-to-interference-plus-noise ratio (SINR), in a multi-user setting. Although traditional model-based optimization methods achieve strong performance, the high complexity raises the research of neural network (NN) based approaches to trade-off the performance and complexity. To fully leverage the high performance of traditional model-based methods and the low complexity of the NN-based method, a knowledge distillation (KD) based algorithm distillation (AD) method is proposed in this paper to improve the performance and convergence speed of the NN-based method, where traditional SINR optimization methods are employed as ``teachers" to assist the training of NNs, which are ``students", thus enhancing the performance of unsupervised and reinforcement learning techniques. This approach aims to alleviate common issues encountered in each of these training paradigms, including the infeasibility of obtaining optimal solutions as labels and overfitting in supervised learning, ensuring higher convergence performance in unsupervised learning, and improving training efficiency in reinforcement learning. Simulation results demonstrate the enhanced performance of the proposed AD-based methods compared to traditional learning methods. Remarkably, this research paves the way for the integration of traditional optimization insights and emerging NN techniques in wireless communication system optimization.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Large-Scale Learning on Overlapped Speech Detection: New Benchmark and New General System
Authors:
Zhaohui Yin,
**gguang Tian,
Xinhui Hu,
Xinkang Xu,
Yang Xiang
Abstract:
Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve t…
▽ More
Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which led to the robustness of them lacks benchmark for evaluation and the accuracy of them remains inadequate in realistic acoustic environments. To solve these problem, we conduct a study of large-scale learning (LSL) in OSD tasks and propose a new general OSD system named CF-OSD with LSL based on Conformer network and LSL. In our study, a large-scale test set consisting of 151h labeled speech of different styles, languages and sound-source distances is produced and used as a new benchmark for evaluating the generality of OSD systems. Rigorous comparative experiments are designed and used to evaluate the effectiveness of LSL in OSD tasks and define the OSD model of our general OSD system. The experiment results show that LSL can significantly improve the accuracy and robustness of OSD systems, and the CF-OSD with LSL system significantly outperforms other OSD systems on our proposed benchmark. Moreover, our system has also achieved state-of-the-art performance on existing small dataset benchmarks, reaching 81.6\% and 53.8\% in the Alimeeting testset and DIHARD II evaluation set, respectively.
△ Less
Submitted 7 September, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Generic singularity behavior of conservative solutions to the Novikov equation
Authors:
Zhen He,
Wei Luo,
Zhaoyang Yin
Abstract:
In this paper, we concentrate on the Novikov equation. We provide a description of the solution in a neighborhood of each singular point.
In this paper, we concentrate on the Novikov equation. We provide a description of the solution in a neighborhood of each singular point.
△ Less
Submitted 11 January, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
AdvFAS: A robust face anti-spoofing framework against adversarial examples
Authors:
Jiawei Chen,
Xiao Yang,
Heng Yin,
Mingzhi Ma,
Bihui Chen,
Jianteng Peng,
Yandong Guo,
Zhaoxia Yin,
Hang Su
Abstract:
Ensuring the reliability of face recognition systems against presentation attacks necessitates the deployment of face anti-spoofing techniques. Despite considerable advancements in this domain, the ability of even the most state-of-the-art methods to defend against adversarial examples remains elusive. While several adversarial defense strategies have been proposed, they typically suffer from cons…
▽ More
Ensuring the reliability of face recognition systems against presentation attacks necessitates the deployment of face anti-spoofing techniques. Despite considerable advancements in this domain, the ability of even the most state-of-the-art methods to defend against adversarial examples remains elusive. While several adversarial defense strategies have been proposed, they typically suffer from constrained practicability due to inevitable trade-offs between universality, effectiveness, and efficiency. To overcome these challenges, we thoroughly delve into the coupled relationship between adversarial detection and face anti-spoofing. Based on this, we propose a robust face anti-spoofing framework, namely AdvFAS, that leverages two coupled scores to accurately distinguish between correctly detected and wrongly detected face images. Extensive experiments demonstrate the effectiveness of our framework in a variety of settings, including different attacks, datasets, and backbones, meanwhile enjoying high accuracy on clean examples. Moreover, we successfully apply the proposed method to detect real-world adversarial examples.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Learning-based Control for PMSM Using Distributed Gaussian Processes with Optimal Aggregation Strategy
Authors:
Zhenxiao Yin,
Xiaobing Dai,
Zewen Yang,
Yang Shen,
Georges Hattab,
Hang Zhao
Abstract:
The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system mode…
▽ More
The growing demand for accurate control in varying and unknown environments has sparked a corresponding increase in the requirements for power supply components, including permanent magnet synchronous motors (PMSMs). To infer the unknown part of the system, machine learning techniques are widely employed, especially Gaussian process regression (GPR) due to its flexibility of continuous system modeling and its guaranteed performance. For practical implementation, distributed GPR is adopted to alleviate the high computational complexity. However, the study of distributed GPR from a control perspective remains an open problem. In this paper, a control-aware optimal aggregation strategy of distributed GPR for PMSMs is proposed based on the Lyapunov stability theory. This strategy exclusively leverages the posterior mean, thereby obviating the need for computationally intensive calculations associated with posterior variance in alternative approaches. Moreover, the straightforward calculation process of our proposed strategy lends itself to seamless implementation in high-frequency PMSM control. The effectiveness of the proposed strategy is demonstrated in the simulations.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence
Authors:
Yang Tian,
Jiyao Zhang,
Zekai Yin,
Hao Dong
Abstract:
In this work, we tackle the problem of online camera-to-robot pose estimation from single-view successive frames of an image sequence, a crucial task for robots to interact with the world.
In this work, we tackle the problem of online camera-to-robot pose estimation from single-view successive frames of an image sequence, a crucial task for robots to interact with the world.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.