Search | arXiv e-print repository

doi 10.1109/JSTSP.2024.3416841

When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors from the Signal Processing Perspective

Authors: Shoujie Li, Zihan Wang, Changsheng Wu, Xiang Li, Shan Luo, Bin Fang, Fuchun Sun, Xiao-** Zhang, Wenbo Ding

Abstract: Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented,… ▽ More Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented, but few of them discussed the significance of signal processing methods to visuotactile sensors. Apart from ingenious hardware design, the full potential of the sensory system toward designated tasks can only be released with the appropriate signal processing methods. Therefore, this paper provides a comprehensive review of visuotactile sensors from the perspective of signal processing methods and outlooks possible future research directions for visuotactile sensors. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing

arXiv:2406.03813 [pdf, other]

Touch100k: A Large-Scale Touch-Language-Vision Dataset for Touch-Centric Multimodal Representation

Authors: Ning Cheng, Changhao Guan, **g Gao, Weihao Wang, You Li, Fandong Meng, Jie Zhou, Bin Fang, **an Xu, Wenjuan Han

Abstract: Touch holds a pivotal position in enhancing the perceptual and interactive capabilities of both humans and robots. Despite its significance, current tactile research mainly focuses on visual and tactile modalities, overlooking the language domain. Inspired by this, we construct Touch100k, a paired touch-language-vision dataset at the scale of 100k, featuring tactile sensation descriptions in multi… ▽ More Touch holds a pivotal position in enhancing the perceptual and interactive capabilities of both humans and robots. Despite its significance, current tactile research mainly focuses on visual and tactile modalities, overlooking the language domain. Inspired by this, we construct Touch100k, a paired touch-language-vision dataset at the scale of 100k, featuring tactile sensation descriptions in multiple granularities (i.e., sentence-level natural expressions with rich semantics, including contextual and dynamic relationships, and phrase-level descriptions capturing the key features of tactile sensations). Based on the dataset, we propose a pre-training method, Touch-Language-Vision Representation Learning through Curriculum Linking (TLV-Link, for short), inspired by the concept of curriculum learning. TLV-Link aims to learn a tactile representation for the GelSight sensor and capture the relationship between tactile, language, and visual modalities. We evaluate our representation's performance across two task categories (namely, material property identification and robot gras** prediction), focusing on tactile representation and zero-shot touch understanding. The experimental evaluation showcases the effectiveness of our representation. By enabling TLV-Link to achieve substantial improvements and establish a new state-of-the-art in touch-centric multimodal representation learning, Touch100k demonstrates its value as a valuable resource for research. Project page: https://cocacola-lab.github.io/Touch100k/. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.12779 [pdf]

Transformer in Touch: A Survey

Authors: **g Gao, Ning Cheng, Bin Fang, Wenjuan Han

Abstract: The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception. This review aims to comprehensively outline the application and development of Transformers in tactile technology. We first introduce the two fundamental concepts behind the success of the Transformer: the self-atte… ▽ More The Transformer model, initially achieving significant success in the field of natural language processing, has recently shown great potential in the application of tactile perception. This review aims to comprehensively outline the application and development of Transformers in tactile technology. We first introduce the two fundamental concepts behind the success of the Transformer: the self-attention mechanism and large-scale pre-training. Then, we delve into the application of Transformers in various tactile tasks, including but not limited to object recognition, cross-modal generation, and object manipulation, offering a concise summary of the core methodologies, performance benchmarks, and design highlights. Finally, we suggest potential areas for further research and future work, aiming to generate more interest within the community, tackle existing challenges, and encourage the use of Transformer models in the tactile field. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 27 pages, 2 tables, 5 figures, accepted by ICIC 2024

arXiv:2405.10063 [pdf, other]

Low-latency Symbol-Synchronous Communication for Multi-hop Sensor Networks

Authors: Xinlei Liu, Andrey Belogaev, Jonathan Oostvogels, Bingwu Fang, Danny Hughes, Maarten Weyn, Jeroen Famaey

Abstract: Wireless sensor networks (WSNs) have received great interest due to their scalability, energy efficiency, and low-cost deployment. By utilizing multi-hop communication, WSNs can cover a wide area using low transmission power without the need for any communication infrastructure. Traditionally, WSNs rely on store-and-forward routing protocols and Time Division Multiple Access (TDMA)-based schedules… ▽ More Wireless sensor networks (WSNs) have received great interest due to their scalability, energy efficiency, and low-cost deployment. By utilizing multi-hop communication, WSNs can cover a wide area using low transmission power without the need for any communication infrastructure. Traditionally, WSNs rely on store-and-forward routing protocols and Time Division Multiple Access (TDMA)-based schedules that avoid interference between different wireless nodes. However, emerging challenging scenarios, such as the industrial Internet of Things (IoT) and robotic swarms, impose strict latency and reliability requirements, which traditional approaches cannot fulfill. In this paper, we propose a novel symbol-synchronous transmission design that provides reliable low-latency communication with a reasonable data rate on classical sub-6GHz RF frequency bands (e.g., the 2.4GHz ISM band). Instead of avoiding overlap** transmissions, the proposed scheme benefits from concurrent transmissions. Using simulation in MATLAB, we prove that the proposed design allows achieving a wire-like delay of 5ms for a 512-bit packet over multiple hops with only a 0.3% latency increase per extra hop and a low bit error rate (BER) of 0.04%. Compared to similar state-of-the-art approaches it can achieve a significantly higher data rate of 100kbps, which is expected to increase further with future improvements of the system. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: EuCNC 2024

arXiv:2405.07237 [pdf, other]

Soft Contact Simulation and Manipulation Learning of Deformable Objects with Vision-based Tactile Sensor

Authors: Jianhua Shan, Yuhao Sun, Shixin Zhang, Fuchun Sun, Zixi Chen, Zirong Shen, Cesare Stefanini, Yiyong Yang, Shan Luo, Bin Fang

Abstract: Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real tra… ▽ More Deformable object manipulation is a classical and challenging research area in robotics. Compared with rigid object manipulation, this problem is more complex due to the deformation properties including elastic, plastic, and elastoplastic deformation. In this paper, we describe a new deformable object manipulation method including soft contact simulation, manipulation learning, and sim-to-real transfer. We propose a novel approach utilizing Vision-Based Tactile Sensors (VBTSs) as the end-effector in simulation to produce observations like relative position, squeezed area, and object contour, which are transferable to real robots. For a more realistic contact simulation, a new simulation environment including elastic, plastic, and elastoplastic deformations is created. We utilize RL strategies to train agents in the simulation, and expert demonstrations are applied for challenging tasks. Finally, we build a real experimental platform to complete the sim-to-real transfer and achieve a 90% success rate on difficult tasks such as cylinder and sphere. To test the robustness of our method, we use plasticine of different hardness and sizes to repeat the tasks including cylinder and sphere. The experimental results show superior performances of deformable object manipulation with the proposed method. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.02914 [pdf, other]

Simulation of Optical Tactile Sensors Supporting Slip and Rotation using Path Tracing and IMPM

Authors: Zirong Shen, Yuhao Sun, Shixin Zhang, Zixi Chen, Heyi Sun, Fuchun Sun, Bin Fang

Abstract: Optical tactile sensors are extensively utilized in intelligent robot manipulation due to their ability to acquire high-resolution tactile information at a lower cost. However, achieving adequate reality and versatility in simulating optical tactile sensors is challenging. In this paper, we propose a simulation method and validate its effectiveness through experiments. We utilize path tracing for… ▽ More Optical tactile sensors are extensively utilized in intelligent robot manipulation due to their ability to acquire high-resolution tactile information at a lower cost. However, achieving adequate reality and versatility in simulating optical tactile sensors is challenging. In this paper, we propose a simulation method and validate its effectiveness through experiments. We utilize path tracing for image rendering, achieving higher similarity to real data than the baseline method in simulating pressing scenarios. Additionally, we apply the improved Material Point Method(IMPM) algorithm to simulate the relative rest between the object and the elastomer surface when the object is in motion, enabling more accurate simulation of complex manipulations such as slip and rotation. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.18201 [pdf, other]

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Authors: Dingzhe Li, Yixiang **, Yong A, Hongze Yu, Jun Shi, Xiaoshuai Hao, Peng Hao, Hua** Liu, Fuchun Sun, Bin Fang

Abstract: The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of compu… ▽ More The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.11458 [pdf, other]

Learn to Tour: Operator Design For Solution Feasibility Map** in Pickup-and-delivery Traveling Salesman Problem

Authors: Bowen Fang, Xu Chen, Xuan Di

Abstract: This paper aims to develop a learning method for a special class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. One-to-one here means that the transported people or goods are associated with designated pairs of pickup and delivery nodes, in contrast to that indistinguishable… ▽ More This paper aims to develop a learning method for a special class of traveling salesman problems (TSP), namely, the pickup-and-delivery TSP (PDTSP), which finds the shortest tour along a sequence of one-to-one pickup-and-delivery nodes. One-to-one here means that the transported people or goods are associated with designated pairs of pickup and delivery nodes, in contrast to that indistinguishable goods can be delivered to any nodes. In PDTSP, precedence constraints need to be satisfied that each pickup node must be visited before its corresponding delivery node. Classic operations research (OR) algorithms for PDTSP are difficult to scale to large-sized problems. Recently, reinforcement learning (RL) has been applied to TSPs. The basic idea is to explore and evaluate visiting sequences in a solution space. However, this approach could be less computationally efficient, as it has to potentially evaluate many infeasible solutions of which precedence constraints are violated. To restrict solution search within a feasible space, we utilize operators that always map one feasible solution to another, without spending time exploring the infeasible solution space. Such operators are evaluated and selected as policies to solve PDTSPs in an RL framework. We make a comparison of our method and baselines, including classic OR algorithms and existing learning methods. Results show that our approach can find tours shorter than baselines. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.09813 [pdf, other]

Towards Comprehensive Multimodal Perception: Introducing the Touch-Language-Vision Dataset

Authors: Ning Cheng, You Li, **g Gao, Bin Fang, **an Xu, Wenjuan Han

Abstract: Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-langua… ▽ More Tactility provides crucial support and enhancement for the perception and interaction capabilities of both humans and robots. Nevertheless, the multimodal research related to touch primarily focuses on visual and tactile modalities, with limited exploration in the domain of language. Beyond vocabulary, sentence-level descriptions contain richer semantics. Based on this, we construct a touch-language-vision dataset named TLV (Touch-Language-Vision) by human-machine cascade collaboration, featuring sentence-level descriptions for multimode alignment. The new dataset is used to fine-tune our proposed lightweight training framework, STLV-Align (Synergistic Touch-Language-Vision Alignment), achieving effective semantic alignment with minimal parameter adjustments (1%). Project Page: https://xiaoen0.github.io/touch.page/. △ Less

Submitted 17 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted by ICIC 2024

arXiv:2403.09284 [pdf, other]

DA-PFL: Dynamic Affinity Aggregation for Personalized Federated Learning

Authors: Xu Yang, Jiyuan Feng, Songyue Guo, Ye Wang, Ye Ding, Binxing Fang, Qing Liao

Abstract: Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In th… ▽ More Personalized federated learning becomes a hot research topic that can learn a personalized learning model for each client. Existing personalized federated learning models prefer to aggregate similar clients with similar data distribution to improve the performance of learning models. However, similaritybased personalized federated learning methods may exacerbate the class imbalanced problem. In this paper, we propose a novel Dynamic Affinity-based Personalized Federated Learning model (DA-PFL) to alleviate the class imbalanced problem during federated learning. Specifically, we build an affinity metric from a complementary perspective to guide which clients should be aggregated. Then we design a dynamic aggregation strategy to dynamically aggregate clients based on the affinity metric in each round to reduce the class imbalanced risk. Extensive experiments show that the proposed DA-PFL model can significantly improve the accuracy of each client in three real-world datasets with state-of-the-art comparison methods. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.00232 [pdf, other]

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators

Authors: Xinyi Li, Ang Li, Bo Fang, Katarzyna Swirydowicz, Ignacio Laguna, Ganesh Gopalakrishnan

Abstract: NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during c… ▽ More NVIDIA Tensor Cores and AMD Matrix Cores (together called Matrix Accelerators) are of growing interest in high-performance computing and machine learning owing to their high performance. Unfortunately, their numerical behaviors are not publicly documented, including the number of extra precision bits maintained, the accumulation order of addition, and predictable subnormal number handling during computations. This makes it impossible to reliably port codes across these differing accelerators. This paper contributes a collection of {\em Feature Targeted Tests for Numerical Properties} that that help determine these features across five floating-point formats, four rounding modes and additional that highlight the rounding behaviors and preservation of extra precision bits. To show the practical relevance of FTTN, we design a simple matrix-multiplication test designed with insights gathered from our feature-tests. We executed this very simple test on five platforms, producing different answers: V100, A100, and MI250X produced 0, MI100 produced 255.875, and Hopper H100 produced 191.875. Our matrix multiplication tests employ patterns found in iterative refinement-based algorithms, highlighting the need to check for significant result variability when porting code across GPUs. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.12043 [pdf, other]

A Lightweight Parallel Framework for Blind Image Quality Assessment

Authors: Qunyue Huang, Bin Fang

Abstract: Existing blind image quality assessment (BIQA) methods focus on designing complicated networks based on convolutional neural networks (CNNs) or transformer. In addition, some BIQA methods enhance the performance of the model in a two-stage training manner. Despite the significant advancements, these methods remarkably raise the parameter count of the model, thus requiring more training time and co… ▽ More Existing blind image quality assessment (BIQA) methods focus on designing complicated networks based on convolutional neural networks (CNNs) or transformer. In addition, some BIQA methods enhance the performance of the model in a two-stage training manner. Despite the significant advancements, these methods remarkably raise the parameter count of the model, thus requiring more training time and computational resources. To tackle the above issues, we propose a lightweight parallel framework (LPF) for BIQA. First, we extract the visual features using a pre-trained feature extraction network. Furthermore, we construct a simple yet effective feature embedding network (FEN) to transform the visual features, aiming to generate the latent representations that contain salient distortion information. To improve the robustness of the latent representations, we present two novel self-supervised subtasks, including a sample-level category prediction task and a batch-level quality comparison task. The sample-level category prediction task is presented to help the model with coarse-grained distortion perception. The batch-level quality comparison task is formulated to enhance the training data and thus improve the robustness of the latent representations. Finally, the latent representations are fed into a distortion-aware quality regression network (DaQRN), which simulates the human vision system (HVS) and thus generates accurate quality scores. Experimental results on multiple benchmark datasets demonstrate that the proposed method achieves superior performance over state-of-the-art approaches. Moreover, extensive analyses prove that the proposed method has lower computational complexity and faster convergence speed. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2312.02697 [pdf, other]

Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes

Authors: Hecheng Wang, Lizhe Qi, Bin Fang, Yunquan Sun

Abstract: In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to se… ▽ More In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to select and instantiate three parameterized action primitives: push, pick, and place. We first train the pick and place options by behavior cloning (BC). Subsequently, we use hierarchical reinforcement learning (HRL) to train the high-level policy and push option. During HRL, we propose a Spatially Extended Q-update (SEQ) to augment the updates for the push option and a Two-Stage Update Scheme (TSUS) to alleviate the non-stationary transition problem in updating the high-level policy. We demonstrate that HCLM significantly outperforms baseline methods in terms of success rate and efficiency in diverse tasks. We also highlight our method's ability to generalize to more cluttered environments with more additional blocks. △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2312.01421 [pdf, other]

RobotGPT: Robot Manipulation Learning from ChatGPT

Authors: Yixiang **, Dingzhe Li, Yong A, Jun Shi, Peng Hao, Fuchun Sun, Jianwei Zhang, Bin Fang

Abstract: We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Altho… ▽ More We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Although setting the temperature to 0 can generate more consistent outputs, it may cause ChatGPT to lose diversity and creativity. Our objective is to leverage ChatGPT's problem-solving capabilities in robot manipulation and train a reliable agent. The framework includes an effective prompt structure and a robust learning model. Additionally, we introduce a metric for measuring task difficulty to evaluate ChatGPT's performance in robot manipulation. Furthermore, we evaluate RobotGPT in both simulation and real-world environments. Compared to directly using ChatGPT to generate code, our framework significantly improves task success rates, with an average increase from 38.5% to 91.5%. Therefore, training a RobotGPT by utilizing ChatGPT as an expert is a more stable approach compared to directly using ChatGPT as a task planner. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.05782 [pdf, other]

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications

Authors: Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, Ang Li

Abstract: Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significan… ▽ More Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significant performance, area, and memory footprint improvement. While promising, the mixed-precision computation on error resilience remains unexplored. To this end, we develop a fault injection framework that systematically injects fault into the mixed-precision computation results. We investigate how the faults affect the accuracy of machine learning applications. Based on the error resilience characteristics, we offer lightweight error detection and correction solutions that significantly improve the overall model accuracy if the models experience hardware faults. The solutions can be efficiently integrated into the accelerator's pipelines. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.12290 [pdf, other]

Fact-based Agent modeling for Multi-Agent Reinforcement Learning

Authors: Baofu Fang, Caiming Zheng, Hao Wang

Abstract: In multi-agent systems, agents need to interact and collaborate with other agents in environments. Agent modeling is crucial to facilitate agent interactions and make adaptive cooperation strategies. However, it is challenging for agents to model the beliefs, behaviors, and intentions of other agents in non-stationary environment where all agent policies are learned simultaneously. In addition, th… ▽ More In multi-agent systems, agents need to interact and collaborate with other agents in environments. Agent modeling is crucial to facilitate agent interactions and make adaptive cooperation strategies. However, it is challenging for agents to model the beliefs, behaviors, and intentions of other agents in non-stationary environment where all agent policies are learned simultaneously. In addition, the existing methods realize agent modeling through behavior cloning which assume that the local information of other agents can be accessed during execution or training. However, this assumption is infeasible in unknown scenarios characterized by unknown agents, such as competition teams, unreliable communication and federated learning due to privacy concerns. To eliminate this assumption and achieve agent modeling in unknown scenarios, Fact-based Agent modeling (FAM) method is proposed in which fact-based belief inference (FBI) network models other agents in partially observable environment only based on its local information. The reward and observation obtained by agents after taking actions are called facts, and FAM uses facts as reconstruction target to learn the policy representation of other agents through a variational autoencoder. We evaluate FAM on various Multiagent Particle Environment (MPE) and compare the results with several state-of-the-art MARL algorithms. Experimental results show that compared with baseline methods, FAM can effectively improve the efficiency of agent policy learning by making adaptive cooperation strategies in multi-agent reinforcement learning tasks, while achieving higher returns in complex competitive-cooperative mixed scenarios. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2309.16979 [pdf, other]

MEMQSim: Highly Memory-Efficient and Modularized Quantum State-Vector Simulation

Authors: Boyuan Zhang, Bo Fang, Qiang Guan, Ang Li, Dingwen Tao

Abstract: In this extended abstract, we have introduced a highly memory-efficient state vector simulation of quantum circuits premised on data compression, harnessing the capabilities of both CPUs and GPUs. We have elucidated the inherent challenges in architecting this system, while concurrently proposing our tailored solutions. Moreover, we have delineated our preliminary implementation and deliberated up… ▽ More In this extended abstract, we have introduced a highly memory-efficient state vector simulation of quantum circuits premised on data compression, harnessing the capabilities of both CPUs and GPUs. We have elucidated the inherent challenges in architecting this system, while concurrently proposing our tailored solutions. Moreover, we have delineated our preliminary implementation and deliberated upon the potential for integration with other GPU-oriented simulators. In forthcoming research, we aim to present a more comprehensive set of results, bolstering the assertion of the efficacy and performance of our approach. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.12994 [pdf, other]

Smart Fuzzing of 5G Wireless Software Implementation

Authors: Huan Wu, Brian Fang, Fei Xie

Abstract: In this paper, we introduce a comprehensive approach to bolstering the security, reliability, and comprehensibility of OpenAirInterface5G (OAI5G), an open-source software framework for the exploration, development, and testing of 5G wireless communication systems. Firstly, we employ AFL++, a powerful fuzzing tool, to fuzzy-test OAI5G with respect to its configuration files rigorously. This extensi… ▽ More In this paper, we introduce a comprehensive approach to bolstering the security, reliability, and comprehensibility of OpenAirInterface5G (OAI5G), an open-source software framework for the exploration, development, and testing of 5G wireless communication systems. Firstly, we employ AFL++, a powerful fuzzing tool, to fuzzy-test OAI5G with respect to its configuration files rigorously. This extensive testing process helps identify errors, defects, and security vulnerabilities that may evade conventional testing methods. Secondly, we harness the capabilities of Large Language Models such as Google Bard to automatically decipher and document the meanings of parameters within the OAI5G codebase that are used in fuzzing. This automated parameter interpretation streamlines subsequent analyses and facilitates more informed decision-making. Together, these two techniques contribute to fortifying the OAI5G system, making it more robust, secure, and understandable for developers and analysts alike. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.07284 [pdf]

Toward Lossless Homomorphic Encryption for Scientific Computation

Authors: Muhammad Jahanzeb Khan, Bo Fang, Dongfang Zhao

Abstract: This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first e… ▽ More This paper presents a comprehensive investigation into encrypted computations using the CKKS (Cheon-Kim-Kim-Song) scheme, with a focus on multi-dimensional vector operations and real-world applications. Through two meticulously designed experiments, the study explores the potential of the CKKS scheme in Super Computing and its implications for data privacy and computational efficiency. The first experiment reveals the promising applicability of CKKS to matrix multiplication, indicating marginal differences in Euclidean distance and near-to-zero mean square error across various matrix sizes. The second experiment, applied to a wildfire dataset, illustrates the feasibility of using encrypted machine learning models without significant loss in accuracy. The insights gleaned from the research set a robust foundation for future innovations, including the potential for GPU acceleration in CKKS computations within TenSEAL. Challenges such as noise budget computation, accuracy loss in multiplication, and the distinct characteristics of arithmetic operations in the context of CKKS are also discussed. The paper serves as a vital step towards understanding the complexities and potentials of encrypted computations, with broad implications for secure data processing and privacy preservation in various scientific domains. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.14034 [pdf, other]

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

Authors: Shen Gao, Zhengliang Shi, Minghang Zhu, Bowen Fang, Xin Xin, Pengjie Ren, Zhumin Chen, Jun Ma, Zhaochun Ren

Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability… ▽ More Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although some works employ open-source LLMs for the tool learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius, a novel tool learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool learning framework in the real-world application scenarios compared to both tuning-free (e.g. ChatGPT, Claude) and tuning-based baselines (e.g. GPT4Tools). △ Less

Submitted 21 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

Comments: Accepted by AAAI 2024

arXiv:2307.09609 [pdf, other]

AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications

Authors: Daoce Wang, Jesus Pulido, Pascal Grosset, Jiannan Tian, Sian **, Houjun Tang, Jean Sexton, Sheng Di, Zarija Lukić, Kai Zhao, Bo Fang, Franck Cappello, James Ahrens, Dingwen Tao

Abstract: As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to… ▽ More As supercomputers advance towards exascale capabilities, computational intensity increases significantly, and the volume of data requiring storage and transmission experiences exponential growth. Adaptive Mesh Refinement (AMR) has emerged as an effective solution to address these two challenges. Concurrently, error-bounded lossy compression is recognized as one of the most efficient approaches to tackle the latter issue. Despite their respective advantages, few attempts have been made to investigate how AMR and error-bounded lossy compression can function together. To this end, this study presents a novel in-situ lossy compression framework that employs the HDF5 filter to improve both I/O costs and boost compression quality for AMR applications. We implement our solution into the AMReX framework and evaluate on two real-world AMR applications, Nyx and WarpX, on the Summit supercomputer. Experiments with 4096 CPU cores demonstrate that AMRIC improves the compression ratio by up to 81X and the I/O performance by up to 39X over AMReX's original compression solution. △ Less

Submitted 13 July, 2023; originally announced July 2023.

Comments: 12 pages, 18 figures, 3 tables, accepted by ACM/IEEE SC '23

arXiv:2306.01123 [pdf, other]

A Neural RDE-based model for solving path-dependent PDEs

Authors: Bowen Fang, Hao Ni, Yue Wu

Abstract: The concept of the path-dependent partial differential equation (PPDE) was first introduced in the context of path-dependent derivatives in financial markets. Its semilinear form was later identified as a non-Markovian backward stochastic differential equation (BSDE). Compared to the classical PDE, the solution of a PPDE involves an infinite-dimensional spatial variable, making it challenging to a… ▽ More The concept of the path-dependent partial differential equation (PPDE) was first introduced in the context of path-dependent derivatives in financial markets. Its semilinear form was later identified as a non-Markovian backward stochastic differential equation (BSDE). Compared to the classical PDE, the solution of a PPDE involves an infinite-dimensional spatial variable, making it challenging to approximate, if not impossible. In this paper, we propose a neural rough differential equation (NRDE)-based model to learn PPDEs, which effectively encodes the path information through the log-signature feature while capturing the fundamental dynamics. The proposed continuous-time model for the PPDE solution offers the benefits of efficient memory usage and the ability to scale with dimensionality. Several numerical experiments, provided to validate the performance of the proposed model in comparison to the strong baseline in the literature, are used to demonstrate its effectiveness. △ Less

Submitted 1 June, 2023; originally announced June 2023.

MSC Class: 68T07; 60L90; 60H30

arXiv:2305.11191 [pdf, other]

Towards Generalizable Data Protection With Transferable Unlearnable Examples

Authors: Bin Fang, Bo Li, Shuang Wu, Tianyi Zheng, Shouhong Ding, Ran Yi, Lizhuang Ma

Abstract: Artificial Intelligence (AI) is making a profound impact in almost every domain. One of the crucial factors contributing to this success has been the access to an abundance of high-quality data for constructing machine learning models. Lately, as the role of data in artificial intelligence has been significantly magnified, concerns have arisen regarding the secure utilization of data, particularly… ▽ More Artificial Intelligence (AI) is making a profound impact in almost every domain. One of the crucial factors contributing to this success has been the access to an abundance of high-quality data for constructing machine learning models. Lately, as the role of data in artificial intelligence has been significantly magnified, concerns have arisen regarding the secure utilization of data, particularly in the context of unauthorized data usage. To mitigate data exploitation, data unlearning have been introduced to render data unexploitable. However, current unlearnable examples lack the generalization required for wide applicability. In this paper, we present a novel, generalizable data protection method by generating transferable unlearnable examples. To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution. Through extensive experimentation, we substantiate the enhanced generalizable protection capabilities of our proposed method. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2305.10691

arXiv:2305.10691 [pdf, other]

Re-thinking Data Availablity Attacks Against Deep Neural Networks

Authors: Bin Fang, Bo Li, Shuang Wu, Ran Yi, Shouhong Ding, Lizhuang Ma

Abstract: The unauthorized use of personal data for commercial purposes and the clandestine acquisition of private data for training machine learning models continue to raise concerns. In response to these issues, researchers have proposed availability attacks that aim to render data unexploitable. However, many current attack methods are rendered ineffective by adversarial training. In this paper, we re-ex… ▽ More The unauthorized use of personal data for commercial purposes and the clandestine acquisition of private data for training machine learning models continue to raise concerns. In response to these issues, researchers have proposed availability attacks that aim to render data unexploitable. However, many current attack methods are rendered ineffective by adversarial training. In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective. Building on these observations, we introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements. We have conducted extensive experiments to substantiate the soundness of our approach. Moreover, our method establishes a robust foundation for future research in this area. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2303.09100 [pdf, other]

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

Authors: Xinyang Liu, Dongsheng Wang, Bowei Fang, Miaoge Li, Zhibin Duan, Yishi Xu, Bo Chen, Mingyuan Zhou

Abstract: For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian prob… ▽ More For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistical distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively. △ Less

Submitted 1 July, 2024; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted by UAI 2024

arXiv:2302.10717 [pdf, other]

doi 10.1109/IROS40897.2019.8967899

Deep Reinforcement Learning for Robotic Pushing and Picking in Cluttered Environment

Authors: Yuhong Deng, Xiaofeng Guo, Yixuan Wei, Kai Lu, Bin Fang, Di Guo, Hua** Liu, Fuchun Sun

Abstract: In this paper, a novel robotic gras** system is established to automatically pick up objects in cluttered scenes. A composite robotic hand composed of a suction cup and a gripper is designed for gras** the object stably. The suction cup is used for lifting the object from the clutter first and the gripper for gras** the object accordingly. We utilize the affordance map to provide pixel-wise… ▽ More In this paper, a novel robotic gras** system is established to automatically pick up objects in cluttered scenes. A composite robotic hand composed of a suction cup and a gripper is designed for gras** the object stably. The suction cup is used for lifting the object from the clutter first and the gripper for gras** the object accordingly. We utilize the affordance map to provide pixel-wise lifting point candidates for the suction cup. To obtain a good affordance map, the active exploration mechanism is introduced to the system. An effective metric is designed to calculate the reward for the current affordance map, and a deep Q-Network (DQN) is employed to guide the robotic hand to actively explore the environment until the generated affordance map is suitable for gras**. Experimental results have demonstrated that the proposed robotic gras** system is able to greatly increase the success rate of the robotic gras** in cluttered scenes. △ Less

Submitted 21 February, 2023; originally announced February 2023.

Comments: has been accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems 2019

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems 2019 (IROS 2019)

arXiv:2301.08343 [pdf, other]

doi 10.1109/LRA.2023.3237042

Tacchi: A Pluggable and Low Computational Cost Elastomer Deformation Simulator for Optical Tactile Sensors

Authors: Zixi Chen, Shixin Zhang, Shan Luo, Fuchun Sun, Bin Fang

Abstract: Simulation is widely applied in robotics research to save time and resources. There have been several works to simulate optical tactile sensors that leverage either a smoothing method or Finite Element Method (FEM). However, elastomer deformation physics is not considered in the former method, whereas the latter requires a massive amount of computational resources like a computer cluster. In this… ▽ More Simulation is widely applied in robotics research to save time and resources. There have been several works to simulate optical tactile sensors that leverage either a smoothing method or Finite Element Method (FEM). However, elastomer deformation physics is not considered in the former method, whereas the latter requires a massive amount of computational resources like a computer cluster. In this work, we propose a pluggable and low computational cost simulator using the Taichi programming language for simulating optical tactile sensors, named as Tacchi . It reconstructs elastomer deformation using particles, which allows deformed elastomer surfaces to be rendered into tactile images and reveals contact information without suffering from high computational costs. Tacchi facilitates creating realistic tactile images in simulation, e.g., ones that capture wear-and-tear defects on object surfaces. In addition, the proposed Tacchi can be integrated with robotics simulators for a robot system simulation. Experiment results showed that Tacchi can produce images with better similarity to real images and achieved higher Sim2Real accuracy compared to the existing methods. Moreover, it can be connected with MuJoCo and Gazebo with only the requirement of 1G memory space in GPU compared to a computer cluster applied for FEM. With Tacchi, physical robot simulation with optical tactile sensors becomes possible. All the materials in this paper are available at https://github.com/zixichen007115/Tacchi . △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: 8 pages, 6 figures, accepted by IEEE Robotics and Automation Letters

arXiv:2301.06309 [pdf, other]

UATVR: Uncertainty-Adaptive Text-Video Retrieval

Authors: Bo Fang, Wenhao Wu, Chang Liu, Yu Zhou, Yuxin Song, Wei** Wang, Xiangbo Shu, Xiangyang Ji, **gdong Wang

Abstract: With the explosive growth of web videos and emerging large-scale vision-language pre-training models, e.g., CLIP, retrieving videos of interest with text instructions has attracted increasing attention. A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities in specific granularities for semantic correspondence. Unfortu… ▽ More With the explosive growth of web videos and emerging large-scale vision-language pre-training models, e.g., CLIP, retrieving videos of interest with text instructions has attracted increasing attention. A common practice is to transfer text-video pairs to the same embedding space and craft cross-modal interactions with certain entities in specific granularities for semantic correspondence. Unfortunately, the intrinsic uncertainties of optimal entity combinations in appropriate granularities for cross-modal queries are understudied, which is especially critical for modalities with hierarchical semantics, e.g., video, text, etc. In this paper, we propose an Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models each look-up as a distribution matching procedure. Concretely, we add additional learnable tokens in the encoders to adaptively aggregate multi-grained semantics for flexible high-level reasoning. In the refined embedding space, we represent text-video pairs as probabilistic distributions where prototypes are sampled for matching evaluation. Comprehensive experiments on four benchmarks justify the superiority of our UATVR, which achieves new state-of-the-art results on MSR-VTT (50.8%), VATEX (64.5%), MSVD (49.7%), and DiDeMo (45.8%). The code is available at https://github.com/bofang98/UATVR. △ Less

Submitted 18 August, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Comments: To appear at ICCV2023

arXiv:2301.00184 [pdf, other]

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Authors: Wenhao Wu, Haipeng Luo, Bo Fang, **gdong Wang, Wanli Ouyang

Abstract: Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences. However, in real-world scenarios, online videos are often accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This insight has motivated us to propose a novel approach to text-video… ▽ More Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences. However, in real-world scenarios, online videos are often accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This insight has motivated us to propose a novel approach to text-video retrieval, where we directly generate associated captions from videos using zero-shot video captioning with knowledge from web-scale pre-trained models (e.g., CLIP and GPT-2). Given the generated captions, a natural question arises: what benefits do they bring to text-video retrieval? To answer this, we introduce Cap4Video, a new framework that leverages captions in three ways: i) Input data: video-caption pairs can augment the training data. ii) Intermediate feature interaction: we perform cross-modal feature interaction between the video and caption to produce enhanced video representations. iii) Output score: the Query-Caption matching branch can complement the original Query-Video matching branch for text-video retrieval. We conduct comprehensive ablation studies to demonstrate the effectiveness of our approach. Without any post-processing, Cap4Video achieves state-of-the-art performance on four standard text-video retrieval benchmarks: MSR-VTT (51.4%), VATEX (66.6%), MSVD (51.8%), and DiDeMo (52.0%). The code is available at https://github.com/whwu95/Cap4Video . △ Less

Submitted 28 March, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: Accepted by CVPR 2023. Selected as a Highlight (Top 2.5% of ALL submissions)

arXiv:2212.14162 [pdf, other]

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

Authors: Feihong Shen, JIng**g Liu, Haizhen Li, Bing Fang, Chenglong Ma, ** Hao, Yang Feng, Youyi Zheng

Abstract: Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous stu… ▽ More Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method. △ Less

Submitted 28 December, 2022; originally announced December 2022.

arXiv:2211.04454 [pdf, other]

SLATE: A Sequence Labeling Approach for Task Extraction from Free-form Inked Content

Authors: Apurva Gandhi, Ryan Serrao, Biyi Fang, Gilbert Antonius, Jenna Hong, Tra My Nguyen, Sheng Yi, Ehi Nosakhare, Irene Shaffer, Soundararajan Srinivasan, Vivek Gupta

Abstract: We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence s… ▽ More We present SLATE, a sequence labeling approach for extracting tasks from free-form content such as digitally handwritten (or "inked") notes on a virtual whiteboard. Our approach allows us to create a single, low-latency model to simultaneously perform sentence segmentation and classification of these sentences into task/non-task sentences. SLATE greatly outperforms a baseline two-model (sentence segmentation followed by classification model) approach, achieving a task F1 score of 84.4%, a sentence segmentation (boundary similarity) score of 88.4% and three times lower latency compared to the baseline. Furthermore, we provide insights into tackling challenges of performing NLP on the inking domain. We release both our code and dataset for this novel task. △ Less

Submitted 17 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted at EMNLP 2022 as an Industry Track paper

arXiv:2210.08288 [pdf, other]

Transformer-based dimensionality reduction

Authors: Ruisheng Ran, Tianyu Gao, Bin Fang

Abstract: Recently, Transformer is much popular and plays an important role in the fields of Machine Learning (ML), Natural Language Processing (NLP), and Computer Vision (CV), etc. In this paper, based on the Vision Transformer (ViT) model, a new dimensionality reduction (DR) model is proposed, named Transformer-DR. From data visualization, image reconstruction and face recognition, the representation abil… ▽ More Recently, Transformer is much popular and plays an important role in the fields of Machine Learning (ML), Natural Language Processing (NLP), and Computer Vision (CV), etc. In this paper, based on the Vision Transformer (ViT) model, a new dimensionality reduction (DR) model is proposed, named Transformer-DR. From data visualization, image reconstruction and face recognition, the representation ability of Transformer-DR after dimensionality reduction is studied, and it is compared with some representative DR methods to understand the difference between Transformer-DR and existing DR methods. The experimental results show that Transformer-DR is an effective dimensionality reduction method. △ Less

Submitted 15 October, 2022; originally announced October 2022.

arXiv:2210.01346 [pdf, other]

doi 10.1109/ICRA48891.2023.10161428

ImmFusion: Robust mmWave-RGB Fusion for 3D Human Body Reconstruction in All Weather Conditions

Authors: Anjun Chen, Xiangyu Wang, Kun Shi, Shaohao Zhu, Bin Fang, Yingfeng Chen, Jiming Chen, Yuchi Huo, Qi Ye

Abstract: 3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and th… ▽ More 3D human reconstruction from RGB images achieves decent results in good weather conditions but degrades dramatically in rough weather. Complementary, mmWave radars have been employed to reconstruct 3D human joints and meshes in rough weather. However, combining RGB and mmWave signals for robust all-weather 3D human reconstruction is still an open challenge, given the sparse nature of mmWave and the vulnerability of RGB images. In this paper, we present ImmFusion, the first mmWave-RGB fusion solution to reconstruct 3D human bodies in all weather conditions robustly. Specifically, our ImmFusion consists of image and point backbones for token feature extraction and a Transformer module for token fusion. The image and point backbones refine global and local features from original data, and the Fusion Transformer Module aims for effective information fusion of two modalities by dynamically selecting informative tokens. Extensive experiments on a large-scale dataset, mmBody, captured in various environments demonstrate that ImmFusion can efficiently utilize the information of two modalities to achieve a robust 3D human body reconstruction in all weather conditions. In addition, our method's accuracy is significantly superior to that of state-of-the-art Transformer-based LiDAR-camera fusion methods. △ Less

Submitted 20 September, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: Accepted to ICRA2023, Project Page: https://chen3110.github.io/ImmFusion/index.html

arXiv:2205.06973 [pdf, other]

Efficient Hierarchical State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning

Authors: Bo Fang, M. Yusuf Özkaya, Ang Li, Ümit V. Çatalyürek, Sriram Krishnamoorthy

Abstract: Early but promising results in quantum computing have been enabled by the concurrent development of quantum algorithms, devices, and materials. Classical simulation of quantum programs has enabled the design and analysis of algorithms and implementation strategies targeting current and anticipated quantum device architectures. In this paper, we present a graph-based approach to achieve efficient q… ▽ More Early but promising results in quantum computing have been enabled by the concurrent development of quantum algorithms, devices, and materials. Classical simulation of quantum programs has enabled the design and analysis of algorithms and implementation strategies targeting current and anticipated quantum device architectures. In this paper, we present a graph-based approach to achieve efficient quantum circuit simulation. Our approach involves partitioning the graph representation of a given quantum circuit into acyclic sub-graphs/circuits that exhibit better data locality. Simulation of each sub-circuit is organized hierarchically, with the iterative construction and simulation of smaller state vectors, improving overall performance. Also, this partitioning reduces the number of passes through data, improving the total computation time. We present three partitioning strategies and observe that acyclic graph partitioning typically results in the best time-to-solution. In contrast, other strategies reduce the partitioning time at the expense of potentially increased simulation times. Experimental evaluation demonstrates the effectiveness of our approach. △ Less

Submitted 14 May, 2022; originally announced May 2022.

arXiv:2205.05413 [pdf, other]

Compact and Efficient KEMs over NTRU Lattices

Authors: Zhichuang Liang, Boyue Fang, Jieyu Zheng, Yunlei Zhao

Abstract: The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency \emph{as a whole}, that is as good as and… ▽ More The NTRU lattice is a promising candidate to construct practical cryptosystems, in particular key encapsulation mechanism (KEM), resistant to quantum computing attacks. Nevertheless, there are still some inherent obstacles to NTRU-based KEM schemes in having integrated performance, taking security, bandwidth, error probability, and computational efficiency \emph{as a whole}, that is as good as and even better than their \{R,M\}LWE-based counterparts. In this work, we solve this problem by presenting a new family of NTRU-based KEM schemes, referred to as CTRU and CNTR. By bridging low-dimensional lattice codes and high-dimensional NTRU-lattice-based cryptography with careful design and analysis, to the best of our knowledge CTRU and CNTR are the first NTRU-based KEM schemes with scalable ciphertext compression via only one \emph{single} ciphertext polynomial, and are the first that could outperform \{R,M\}LWE-based KEM schemes in integrated performance. For instance, compared to Kyber that is currently the only standardized KEM by NIST, on the recommended parameter set CNTR-768 has about $12\%$ smaller ciphertext size while encapsulating 384-bit keys compared to the fixed 256-bit key size of Kyber, security strengthened by $(8,7)$ bits for classical and quantum security respectively, and significantly lower error probability ($2^{-230}$ of CNTR-768 vs. $2^{-164}$ of Kyber-768). In comparison with the state-of-the-art AVX2 implementation of Kyber-768, CNTR-768 is faster by 1.9X in KeyGen, 2.6X in Encaps, and 1.2X in Decaps, respectively. When compared to the NIST Round 3 finalist NTRU-HRSS, our CNTR-768 has about $15\%$ smaller ciphertext size, and the security is strengthened by $(55,49)$ bits for classical and quantum security respectively. As for the AVX2 implementation, CNTR-768 is faster than NTRU-HRSS by 19X in KeyGen, 2.3X in Encaps, and 1.6X in Decaps, respectively. △ Less

Submitted 9 November, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:2203.05784 [pdf]

AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications

Authors: ** Hao, Jiaxiang Liu, ** Li, Wei Pan, Ruizhe Chen, Huimin Xiong, Kaiwei Sun, Hangzheng Lin, Wanlu Liu, Wanghui Ding, Jianfei Yang, Haoji Hu, Yueling Zhang, Yang Feng, Zeyu Zhao, Huikai Wu, Youyi Zheng, Bing Fang, Zuozhu Liu, Zhihe Zhao

Abstract: A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical ap… ▽ More A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical applicability. Here, we present a Deep Dental Multimodal Analysis (DDMA) framework consisting of a CBCT segmentation model, an intraoral scan (IOS) segmentation model (the most accurate digital dental model), and a fusion model to generate 3D fused crown-root-bone structures with high fidelity and accurate occlusal and dentition information. Our model was trained with a large-scale dataset with 503 CBCT and 28,559 IOS meshes manually annotated by experienced human experts. For CBCT segmentation, we use a five-fold cross validation test, each with 50 CBCT, and our model achieves an average Dice coefficient and IoU of 93.99% and 88.68%, respectively, significantly outperforming the baselines. For IOS segmentations, our model achieves an mIoU of 93.07% and 95.70% on the maxillary and mandible on a test set of 200 IOS meshes, which are 1.77% and 3.52% higher than the state-of-art method. Our DDMA framework takes about 20 to 25 minutes to generate the fused 3D mesh model following the sequential processing order, compared to over 5 hours by human experts. Notably, our framework has been incorporated into a software by a clear aligner manufacturer, and real-world clinical cases demonstrate that our model can visualize crown-root-bone structures during the entire orthodontic treatment and can predict risks like dehiscence and fenestration. These findings demonstrate the potential of multi-modal deep learning to improve the quality of digital dental models and help dentists make better clinical decisions. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Comments: 30 pages, 6 figures, 3 tables

arXiv:2203.00762 [pdf, other]

Topic Analysis for Text with Side Data

Authors: Biyi Fang, Kripa Rajshekhar, Diego Klabjan

Abstract: Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a… ▽ More Although latent factor models (e.g., matrix factorization) obtain good performance in predictions, they suffer from several problems including cold-start, non-transparency, and suboptimal recommendations. In this paper, we employ text with side data to tackle these limitations. We introduce a hybrid generative probabilistic model that combines a neural network with a latent topic model, which is a four-level hierarchical Bayesian model. In the model, each document is modeled as a finite mixture over an underlying set of topics and each topic is modeled as an infinite mixture over an underlying set of topic probabilities. Furthermore, each topic probability is modeled as a finite mixture over side data. In the context of text, the neural network provides an overview distribution about side data for the corresponding text, which is the prior distribution in LDA to help perform topic grou**. The approach is evaluated on several different datasets, where the model is shown to outperform standard LDA and Dirichlet-multinomial regression (DMR) in terms of topic grou**, model perplexity, classification and comment generation. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2203.00761 [pdf, other]

Tricks and Plugins to GBM on Images and Sequences

Authors: Biyi Fang, Jean Utke, Diego Klabjan

Abstract: Convolutional neural networks (CNNs) and transformers, which are composed of multiple processing layers and blocks to learn the representations of data with multiple abstract levels, are the most successful machine learning models in recent years. However, millions of parameters and many blocks make them difficult to be trained, and sometimes several days or weeks are required to find an ideal arc… ▽ More Convolutional neural networks (CNNs) and transformers, which are composed of multiple processing layers and blocks to learn the representations of data with multiple abstract levels, are the most successful machine learning models in recent years. However, millions of parameters and many blocks make them difficult to be trained, and sometimes several days or weeks are required to find an ideal architecture or tune the parameters. Within this paper, we propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of dynamic feature selection and BoostCNN, and another new family of algorithms combining boosting and transformers. To learn these new models, we introduce subgrid selection and importance sampling strategies and propose a set of algorithms to incorporate boosting weights into a deep learning architecture based on a least squares objective function. These algorithms not only reduce the required manual effort for finding an appropriate network architecture but also result in superior performance and lower running time. Experiments show that the proposed methods outperform benchmarks on several fine-grained classification tasks. △ Less

Submitted 1 March, 2022; originally announced March 2022.

arXiv:2202.12796 [pdf, other]

doi 10.1109/TRO.2023.3238910

Hybrid Robotic Gras** with a Soft Multimodal Gripper and a Deep Multistage Learning Scheme

Authors: Fukang Liu, Fuchun Sun, Bin Fang, Xiang Li, Songyu Sun, Hua** Liu

Abstract: Gras** has long been considered an important and practical task in robotic manipulation. Yet achieving robust and efficient grasps of diverse objects is challenging, since it involves gripper design, perception, control and learning, etc. Recent learning-based approaches have shown excellent performance in gras** a variety of novel objects. However, these methods either are typically limited t… ▽ More Gras** has long been considered an important and practical task in robotic manipulation. Yet achieving robust and efficient grasps of diverse objects is challenging, since it involves gripper design, perception, control and learning, etc. Recent learning-based approaches have shown excellent performance in gras** a variety of novel objects. However, these methods either are typically limited to one single gras** mode, or else more end effectors are needed to grasp various objects. In addition, gripper design and learning methods are commonly developed separately, which may not adequately explore the ability of a multimodal gripper. In this paper, we present a deep reinforcement learning (DRL) framework to achieve multistage hybrid robotic gras** with a new soft multimodal gripper. A soft gripper with three gras** modes (i.e., envelo**, sucking, and envelo**_then_sucking) can both deal with objects of different shapes and grasp more than one object simultaneously. We propose a novel hybrid gras** method integrated with the multimodal gripper to optimize the number of gras** actions. We evaluate the DRL framework under different scenarios (i.e., with different ratios of objects of two grasp types). The proposed algorithm is shown to reduce the number of gras** actions (i.e., enlarge the gras** efficiency, with maximum values of 161% in simulations and 154% in real-world experiments) compared to single gras** modes. △ Less

Submitted 5 March, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

arXiv:2110.11084 [pdf, other]

doi 10.1109/TGRS.2022.3180685

Grafting Transformer on Automatically Designed Convolutional Neural Network for Hyperspectral Image Classification

Authors: Xizhe Xue, Haokui Zhang, Bei Fang, Zongwen Bai, Ying Li

Abstract: Hyperspectral image (HSI) classification has been a hot topic for decides, as hyperspectral images have rich spatial and spectral information and provide strong basis for distinguishing different land-cover objects. Benefiting from the development of deep learning technologies, deep learning based HSI classification methods have achieved promising performance. Recently, several neural architecture… ▽ More Hyperspectral image (HSI) classification has been a hot topic for decides, as hyperspectral images have rich spatial and spectral information and provide strong basis for distinguishing different land-cover objects. Benefiting from the development of deep learning technologies, deep learning based HSI classification methods have achieved promising performance. Recently, several neural architecture search (NAS) algorithms have been proposed for HSI classification, which further improve the accuracy of HSI classification to a new level. In this paper, NAS and Transformer are combined for handling HSI classification task for the first time. Compared with previous work, the proposed method has two main differences. First, we revisit the search spaces designed in previous HSI classification NAS methods and propose a novel hybrid search space, consisting of the space dominated cell and the spectrum dominated cell. Compared with search spaces proposed in previous works, the proposed hybrid search space is more aligned with the characteristic of HSI data, that is, HSIs have a relatively low spatial resolution and an extremely high spectral resolution. Second, to further improve the classification accuracy, we attempt to graft the emerging transformer module on the automatically designed convolutional neural network (CNN) to add global information to local region focused features learned by CNN. Experimental results on three public HSI datasets show that the proposed method achieves much better performance than comparison approaches, including manually designed network and NAS based HSI classification methods. Especially on the most recently captured dataset Houston University, overall accuracy is improved by nearly 6 percentage points. Code is available at: https://github.com/Cecilia-xue/HyT-NAS. △ Less

Submitted 6 April, 2023; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: 15 pages, 10 figures

arXiv:2108.05013 [pdf, other]

Elastic Tactile Simulation Towards Tactile-Visual Perception

Authors: Yikai Wang, Wenbing Huang, Bin Fang, Fuchun Sun, Chang Li

Abstract: Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response in a virtual environment comes as a desirable direction of robotic research. In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation. Most existing works model the tactile sensor as a rigid multi… ▽ More Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response in a virtual environment comes as a desirable direction of robotic research. In this paper, we propose Elastic Interaction of Particles (EIP) for tactile simulation. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between the two objects. By contrast, EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact. With the tactile simulation by EIP, we further propose a tactile-visual perception network that enables information fusion between tactile data and visual images. The perception network is based on a global-to-local fusion mechanism where multi-scale tactile features are aggregated to the corresponding local region of the visual modality with the guidance of tactile positions and directions. The fusion method exhibits superiority regarding the 3D geometric reconstruction task. △ Less

Submitted 12 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: ACMMM 2021 (Oral). Code available at https://github.com/yikaiw/EIP. arXiv admin note: substantial text overlap with arXiv:2011.11528

arXiv:2107.07467 [pdf, other]

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Authors: Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

Abstract: Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs r… ▽ More Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10 and Bert for SQuAD and competitive result on ResNet50 for ImageNet. The source code is available at https://github.com/tianyic/only_train_once. △ Less

Submitted 11 November, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

Comments: Accepted by NeurIPS 2021

arXiv:2107.03578 [pdf, other]

Video 3D Sampling for Self-supervised Representation Learning

Authors: Wei Li, Dezhao Luo, Bo Fang, Yu Zhou, Wei** Wang

Abstract: Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a novel self-supervised method for video representation learning, referred to as Video 3D Sampling (V3S). In order to sufficiently utilize the information (spati… ▽ More Most of the existing video self-supervised methods mainly leverage temporal signals of videos, ignoring that the semantics of moving objects and environmental information are all critical for video-related tasks. In this paper, we propose a novel self-supervised method for video representation learning, referred to as Video 3D Sampling (V3S). In order to sufficiently utilize the information (spatial and temporal) provided in videos, we pre-process a video from three dimensions (width, height, time). As a result, we can leverage the spatial information (the size of objects), temporal information (the direction and magnitude of motions) as our learning target. In our implementation, we combine the sampling of the three dimensions and propose the scale and projection transformations in space and time respectively. The experimental results show that, when applied to action recognition, video retrieval and action similarity labeling, our approach improves the state-of-the-arts with significant margins. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 9 pages, 5 figures, 6 tables

arXiv:2106.13033 [pdf]

A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021

Authors: Ke-Han Lu, Bo-Han Fang, Kuan-Yu Chen

Abstract: In this paper, inspired by the successes of visionlanguage pre-trained models and the benefits from training with adversarial attacks, we present a novel transformerbased cross-modal fusion modeling by incorporating the both notions for VQA challenge 2021. Specifically, the proposed model is on top of the architecture of VinVL model [19], and the adversarial training strategy [4] is applied to mak… ▽ More In this paper, inspired by the successes of visionlanguage pre-trained models and the benefits from training with adversarial attacks, we present a novel transformerbased cross-modal fusion modeling by incorporating the both notions for VQA challenge 2021. Specifically, the proposed model is on top of the architecture of VinVL model [19], and the adversarial training strategy [4] is applied to make the model robust and generalized. Moreover, two implementation tricks are also used in our system to obtain better results. The experiments demonstrate that the novel framework can achieve 76.72% on VQAv2 test-std set. △ Less

Submitted 26 October, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: CVPR 2021 Workshop: Visual Question Answering (VQA) Challenge

arXiv:2105.12929 [pdf, other]

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights

Authors: Bo Fang, Daoce Wang, Sian **, Quincey Koziol, Zhao Zhang, Qiang Guan, Suren Byna, Sriram Krishnamoorthy, Dingwen Tao

Abstract: In recent years, the increasing complexity in scientific simulations and emerging demands for training heavy artificial intelligence models require massive and fast data accesses, which urges high-performance computing (HPC) platforms to equip with more advanced storage infrastructures such as solid-state disks (SSDs). While SSDs offer high-performance I/O, the reliability challenges faced by the… ▽ More In recent years, the increasing complexity in scientific simulations and emerging demands for training heavy artificial intelligence models require massive and fast data accesses, which urges high-performance computing (HPC) platforms to equip with more advanced storage infrastructures such as solid-state disks (SSDs). While SSDs offer high-performance I/O, the reliability challenges faced by the HPC applications under the SSD-related failures remains unclear, in particular for failures resulting in data corruptions. The goal of this paper is to understand the impact of SSD-related faults on the behaviors of complex HPC applications. To this end, we propose FFIS, a FUSE-based fault injection framework that systematically introduces storage faults into the application layer to model the errors originated from SSDs. FFIS is able to plant different I/O related faults into the data returned from underlying file systems, which enables the investigation on the error resilience characteristics of the scientific file format. We demonstrate the use of FFIS with three representative real HPC applications, showing how each application reacts to the data corruptions, and provide insights on the error resilience of the widely adopted HDF5 file format for the HPC applications. △ Less

Submitted 2 August, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: 12 pages, 9 figures, 4 tables, accepted by IEEE Cluster'21

arXiv:2104.10553 [pdf]

Rethinking Annotation Granularity for Overcoming Shortcuts in Deep Learning-based Radiograph Diagnosis: A Multicenter Study

Authors: Luyang Luo, Hao Chen, Yongjie Xiao, Yanning Zhou, Xi Wang, Varut Vardhanabhuti, Mingxiang Wu, Chu Han, Zaiyi Liu, Xin Hao Benjamin Fang, Efstratios Tsougenis, Huang**g Lin, Pheng-Ann Heng

Abstract: Two DL models were developed using radiograph-level annotations (yes or no disease) and fine-grained lesion-level annotations (lesion bounding boxes), respectively named CheXNet and CheXDet. The models' internal classification performance and lesion localization performance were compared on a testing set (n=2,922), external classification performance was compared on NIH-Google (n=4,376) and PadChe… ▽ More Two DL models were developed using radiograph-level annotations (yes or no disease) and fine-grained lesion-level annotations (lesion bounding boxes), respectively named CheXNet and CheXDet. The models' internal classification performance and lesion localization performance were compared on a testing set (n=2,922), external classification performance was compared on NIH-Google (n=4,376) and PadChest (n=24,536) datasets, and external lesion localization performance was compared on NIH-ChestX-ray14 dataset (n=880). The models were also compared to radiologists on a subset of the internal testing set (n=496). Given sufficient training data, both models performed comparably to radiologists. CheXDet achieved significant improvement for external classification, such as in classifying fracture on NIH-Google (CheXDet area under the ROC curve [AUC]: 0.67, CheXNet AUC: 0.51; p<.001) and PadChest (CheXDet AUC: 0.78, CheXNet AUC: 0.55; p<.001). CheXDet achieved higher lesion detection performance than CheXNet for most abnormalities on all datasets, such as in detecting pneumothorax on the internal set (CheXDet jacknife alternative free-response ROC-figure of merit [JAFROC-FOM]: 0.87, CheXNet JAFROC-FOM: 0.13; p<.001) and NIH-ChestX-ray14 (CheXDet JAFROC-FOM: 0.55, CheXNet JAFROC-FOM: 0.04; p<.001). To summarize, fine-grained annotations overcame shortcut learning and enabled DL models to identify correct lesion patterns, improving the models' generalizability. △ Less

Submitted 8 November, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Radiology: Artificial Intelligence

arXiv:2102.11482 [pdf, other]

Structural Similarity of Boundary Conditions and an Efficient Local Search Algorithm for Goal Conflict Identification

Authors: Hongzhen Zhong, Hai Wan, Weilin Luo, Zhanhao Xiao, Jia Li, Biqing Fang

Abstract: In goal-oriented requirements engineering, goal conflict identification is of fundamental importance for requirements analysis. The task aims to find the feasible situations which make the goals diverge within the domain, called boundary conditions (BCs). However, the existing approaches for goal conflict identification fail to find sufficient BCs and general BCs which cover more combinations of c… ▽ More In goal-oriented requirements engineering, goal conflict identification is of fundamental importance for requirements analysis. The task aims to find the feasible situations which make the goals diverge within the domain, called boundary conditions (BCs). However, the existing approaches for goal conflict identification fail to find sufficient BCs and general BCs which cover more combinations of circumstances. From the BCs found by these existing approaches, we have observed an interesting phenomenon that there are some pairs of BCs are similar in formula structure, which occurs frequently in the experimental cases. In other words, once a BC is found, a new BC may be discovered quickly by slightly changing the former. It inspires us to develop a local search algorithm named LOGION to find BCs, in which the structural similarity is captured by the neighborhood relation of formulae. Based on structural similarity, LOGION can find a lot of BCs in a short time. Moreover, due to the large number of BCs identified, it potentially selects more general BCs from them. By taking experiments on a set of cases, we show that LOGION effectively exploits the structural similarity of BCs. We also compare our algorithm against the two state-of-the-art approaches. The experimental results show that LOGION produces one order of magnitude more BCs than the state-of-the-art approaches and confirm that LOGION finds out more general BCs thanks to a large number of BCs. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2011.11528 [pdf, other]

Elastic Interaction of Particles for Robotic Tactile Simulation

Authors: Yikai Wang, Wenbing Huang, Bin Fang, Fuchun Sun

Abstract: Tactile sensing plays an important role in robotic perception and manipulation. To overcome the real-world limitations of data collection, simulating tactile response in virtual environment comes as a desire direction of robotic research. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as charac… ▽ More Tactile sensing plays an important role in robotic perception and manipulation. To overcome the real-world limitations of data collection, simulating tactile response in virtual environment comes as a desire direction of robotic research. Most existing works model the tactile sensor as a rigid multi-body, which is incapable of reflecting the elastic property of the tactile sensor as well as characterizing the fine-grained physical interaction between two objects. In this paper, we propose Elastic Interaction of Particles (EIP), a novel framework for tactile emulation. At its core, EIP models the tactile sensor as a group of coordinated particles, and the elastic theory is applied to regulate the deformation of particles during the contact process. The implementation of EIP is conducted from scratch, without resorting to any existing physics engine. Experiments to verify the effectiveness of our method have been carried out on two applications: robotic perception with tactile data and 3D geometric reconstruction by tactile-visual fusion. It is possible to open up a new vein for robotic tactile simulation, and contribute to various downstream robotic tasks. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2011.04868 [pdf, other]

Neural Network Compression Via Sparse Optimization

Authors: Tianyi Chen, Bo Ji, Yixin Shi, Tianyu Ding, Biyi Fang, Sheng Yi, Xiao Tu

Abstract: The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yi… ▽ More The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively. △ Less

Submitted 11 November, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

arXiv:2010.08861 [pdf, other]

Deep Learning in the Era of Edge Computing: Challenges and Opportunities

Authors: Mi Zhang, Faen Zhang, Nicholas D. Lane, Yuanchao Shu, Xiao Zeng, Biyi Fang, Shen Yan, Hui Xu

Abstract: The era of edge computing has arrived. Although the Internet is the backbone of edge computing, its true value lies at the intersection of gathering data from sensors and extracting meaningful information from the sensor data. We envision that in the near future, majority of edge devices will be equipped with machine intelligence powered by deep learning. However, deep learning-based approaches re… ▽ More The era of edge computing has arrived. Although the Internet is the backbone of edge computing, its true value lies at the intersection of gathering data from sensors and extracting meaningful information from the sensor data. We envision that in the near future, majority of edge devices will be equipped with machine intelligence powered by deep learning. However, deep learning-based approaches require a large volume of high-quality data to train and are very expensive in terms of computation, memory, and power consumption. In this chapter, we describe eight research challenges and promising opportunities at the intersection of computer systems, networking, and machine learning. Solving those challenges will enable resource-limited edge devices to leverage the amazing capability of deep learning. We hope this chapter could inspire new research that will eventually lead to the realization of the vision of intelligent edge. △ Less

Submitted 17 October, 2020; originally announced October 2020.

Showing 1–50 of 88 results for author: Fang, B