-
Model Predictive Simulation Using Structured Graphical Models and Transformers
Authors:
Xinghua Lou,
Meet Dave,
Shrinu Kushagra,
Miguel Lazaro-Gredilla,
Kevin Murphy
Abstract:
We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs), and apply it to the Waymo SimAgents challenge. The transformer baseline is based on the MTR model, which predicts multiple future trajectories conditioned on the past trajectories and static road layout features. We then improve upon these g…
▽ More
We propose an approach to simulating trajectories of multiple interacting agents (road users) based on transformers and probabilistic graphical models (PGMs), and apply it to the Waymo SimAgents challenge. The transformer baseline is based on the MTR model, which predicts multiple future trajectories conditioned on the past trajectories and static road layout features. We then improve upon these generated trajectories using a PGM, which contains factors which encode prior knowledge, such as a preference for smooth trajectories, and avoidance of collisions with static obstacles and other moving agents. We perform (approximate) MAP inference in this PGM using the Gauss-Newton method. Finally we sample $K=32$ trajectories for each of the $N \sim 100$ agents for the next $T=8 Δ$ time steps, where $Δ=10$ is the sampling rate per second. Following the Model Predictive Control (MPC) paradigm, we only return the first element of our forecasted trajectories at each step, and then we replan, so that the simulation can constantly adapt to its changing environment. We therefore call our approach "Model Predictive Simulation" or MPS. We show that MPS improves upon the MTR baseline, especially in safety critical metrics such as collision rate. Furthermore, our approach is compatible with any underlying forecasting model, and does not require extra training, so we believe it is a valuable contribution to the community.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Position: Foundation Agents as the Paradigm Shift for Decision Making
Authors:
Xiaoqian Liu,
Xingzhou Lou,
Jianbin Jiao,
Junge Zhang
Abstract:
Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation ag…
▽ More
Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.
△ Less
Submitted 29 May, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
EEG-DBNet: A Dual-Branch Network for Temporal-Spectral Decoding in Motor-Imagery Brain-Computer Interfaces
Authors:
Xicheng Lou,
Xinwei Li,
Hongying Meng,
Jun Hu,
Meili Xu,
Yue Zhao,
Jiazhang Yang,
Zhangyong Li
Abstract:
Motor imagery electroencephalogram (EEG)-based brain-computer interfaces (BCIs) offer significant advantages for individuals with restricted limb mobility. However, challenges such as low signal-to-noise ratio and limited spatial resolution impede accurate feature extraction from EEG signals, thereby affecting the classification accuracy of different actions. To address these challenges, this stud…
▽ More
Motor imagery electroencephalogram (EEG)-based brain-computer interfaces (BCIs) offer significant advantages for individuals with restricted limb mobility. However, challenges such as low signal-to-noise ratio and limited spatial resolution impede accurate feature extraction from EEG signals, thereby affecting the classification accuracy of different actions. To address these challenges, this study proposes an end-to-end dual-branch network (EEG-DBNet) that decodes the temporal and spectral sequences of EEG signals in parallel through two distinct network branches. Each branch comprises a local convolutional block and a global convolutional block. The local convolutional block transforms the source signal from the temporal-spatial domain to the temporal-spectral domain. By varying the number of filters and convolution kernel sizes, the local convolutional blocks in different branches adjust the length of their respective dimension sequences. Different types of pooling layers are then employed to emphasize the features of various dimension sequences, setting the stage for subsequent global feature extraction. The global convolution block splits and reconstructs the feature of the signal sequence processed by the local convolution block in the same branch and further extracts features through the dilated causal convolutional neural networks. Finally, the outputs from the two branches are concatenated, and signal classification is completed via a fully connected layer. Our proposed method achieves classification accuracies of 85.84% and 91.60% on the BCI Competition 4-2a and BCI Competition 4-2b datasets, respectively, surpassing existing state-of-the-art models. The source code is available at https://github.com/xicheng105/EEG-DBNet.
△ Less
Submitted 19 June, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Unleashing the Power of Unlabeled Data: A Self-supervised Learning Framework for Cyber Attack Detection in Smart Grids
Authors:
Hanyu Zeng,
Pengfei Zhou,
Xin Lou,
Zhen Wei Ng,
David K. Y. Yau,
Marianne Winslett
Abstract:
Modern power grids are undergoing significant changes driven by information and communication technologies (ICTs), and evolving into smart grids with higher efficiency and lower operation cost. Using ICTs, however, comes with an inevitable side effect that makes the power system more vulnerable to cyber attacks. In this paper, we propose a self-supervised learning-based framework to detect and ide…
▽ More
Modern power grids are undergoing significant changes driven by information and communication technologies (ICTs), and evolving into smart grids with higher efficiency and lower operation cost. Using ICTs, however, comes with an inevitable side effect that makes the power system more vulnerable to cyber attacks. In this paper, we propose a self-supervised learning-based framework to detect and identify various types of cyber attacks. Different from existing approaches, the proposed framework does not rely on large amounts of well-curated labeled data but makes use of the massive unlabeled data in the wild which are easily accessible. Specifically, the proposed framework adopts the BERT model from the natural language processing domain and learns generalizable and effective representations from the unlabeled sensing data, which capture the distinctive patterns of different attacks. Using the learned representations, together with a very small amount of labeled data, we can train a task-specific classifier to detect various types of cyber attacks. Meanwhile, real-world training datasets are usually imbalanced, i.e., there are only a limited number of data samples containing attacks. In order to cope with such data imbalance, we propose a new loss function, separate mean error (SME), which pays equal attention to the large and small categories to better train the model. Experiment results in a 5-area power grid system with 37 buses demonstrate the superior performance of our framework over existing approaches, especially when a very limited portion of labeled data are available, e.g., as low as 0.002\%. We believe such a framework can be easily adopted to detect a variety of cyber attacks in other power grid scenarios.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
SPO: Multi-Dimensional Preference Sequential Alignment With Implicit Reward Modeling
Authors:
Xingzhou Lou,
Junge Zhang,
Jian Xie,
Lifeng Liu,
Dong Yan,
Kaiqi Huang
Abstract:
Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fin…
▽ More
Human preference alignment is critical in building powerful and reliable large language models (LLMs). However, current methods either ignore the multi-dimensionality of human preferences (e.g. helpfulness and harmlessness) or struggle with the complexity of managing multiple reward models. To address these issues, we propose Sequential Preference Optimization (SPO), a method that sequentially fine-tunes LLMs to align with multiple dimensions of human preferences. SPO avoids explicit reward modeling, directly optimizing the models to align with nuanced human preferences. We theoretically derive closed-form optimal SPO policy and loss function. Gradient analysis is conducted to show how SPO manages to fine-tune the LLMs while maintaining alignment on previously optimized dimensions. Empirical results on LLMs of different size and multiple evaluation datasets demonstrate that SPO successfully aligns LLMs across multiple dimensions of human preferences and significantly outperforms the baselines.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Knowledge-aware Dual-side Attribute-enhanced Recommendation
Authors:
Taotian Pang,
Xingyu Lou,
Fei Zhao,
Zhen Wu,
Kuiyao Dong,
Qiuying Peng,
Yue Qi,
Xinyu Dai
Abstract:
\textit{Knowledge-aware} recommendation methods (KGR) based on \textit{graph neural networks} (GNNs) and \textit{contrastive learning} (CL) have achieved promising performance. However, they fall short in modeling fine-grained user preferences and further fail to leverage the \textit{preference-attribute connection} to make predictions, leading to sub-optimal performance. To address the issue, we…
▽ More
\textit{Knowledge-aware} recommendation methods (KGR) based on \textit{graph neural networks} (GNNs) and \textit{contrastive learning} (CL) have achieved promising performance. However, they fall short in modeling fine-grained user preferences and further fail to leverage the \textit{preference-attribute connection} to make predictions, leading to sub-optimal performance. To address the issue, we propose a method named \textit{\textbf{K}nowledge-aware \textbf{D}ual-side \textbf{A}ttribute-enhanced \textbf{R}ecommendation} (KDAR). Specifically, we build \textit{user preference representations} and \textit{attribute fusion representations} upon the attribute information in knowledge graphs, which are utilized to enhance \textit{collaborative filtering} (CF) based user and item representations, respectively. To discriminate the contribution of each attribute in these two types of attribute-based representations, a \textit{multi-level collaborative alignment contrasting} mechanism is proposed to align the importance of attributes with CF signals. Experimental results on four benchmark datasets demonstrate the superiority of KDAR over several state-of-the-art baselines. Further analyses verify the effectiveness of our method. The code of KDAR is released at: \href{https://github.com/TJTP/KDAR}{https://github.com/TJTP/KDAR}.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Power Optimization for Integrated Active and Passive Sensing in DFRC Systems
Authors:
Xingliang Lou,
Wenchao Xia,
Kai-Kit Wong,
Haitao Zhao,
Tony Q. S. Quek,
Hongbo Zhu
Abstract:
Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user…
▽ More
Most existing works on dual-function radar-communication (DFRC) systems mainly focus on active sensing, but ignore passive sensing. To leverage multi-static sensing capability, we explore integrated active and passive sensing (IAPS) in DFRC systems to remedy sensing performance. The multi-antenna base station (BS) is responsible for communication and active sensing by transmitting signals to user equipments while detecting a target according to echo signals. In contrast, passive sensing is performed at the receive access points (RAPs). We consider both the cases where the capacity of the backhaul links between the RAPs and BS is unlimited or limited and adopt different fusion strategies. Specifically, when the backhaul capacity is unlimited, the BS and RAPs transfer sensing signals they have received to the central controller (CC) for signal fusion. The CC processes the signals and leverages the generalized likelihood ratio test detector to determine the present of a target. However, when the backhaul capacity is limited, each RAP, as well as the BS, makes decisions independently and sends its binary inference results to the CC for result fusion via voting aggregation. Then, aiming at maximize the target detection probability under communication quality of service constraints, two power optimization algorithms are proposed. Finally, numerical simulations demonstrate that the sensing performance in case of unlimited backhaul capacity is much better than that in case of limited backhaul capacity. Moreover, it implied that the proposed IAPS scheme outperforms only-passive and only-active sensing schemes, especially in unlimited capacity case.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Authors:
Xingzhou Lou,
Junge Zhang,
Ziyan Wang,
Kaiqi Huang,
Yali Du
Abstract:
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network,…
▽ More
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints. Employing constraints expressed via easily-understandable human language offers considerable potential for real-world applications due to its accessibility and non-reliance on domain expertise. Previous safe RL methods with natural language constraints typically adopt a recurrent neural network, which leads to limited capabilities when dealing with various forms of human language input. Furthermore, these methods often require a ground-truth cost function, necessitating domain expertise for the conversion of language constraints into a well-defined cost function that determines constraint violation. To address these issues, we proposes to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints and allow them to infer costs for safe policy learning. Through the use of pre-trained LMs and the elimination of the need for a ground-truth cost, our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints. Experiments on grid-world navigation and robot control show that the proposed method can achieve strong performance while adhering to given constraints. The usage of pre-trained LMs allows our method to comprehend complicated constraints and learn safe policies without the need for ground-truth cost at any stage of training or evaluation. Extensive ablation studies are conducted to demonstrate the efficacy of each part of our method.
△ Less
Submitted 15 May, 2024; v1 submitted 15 January, 2024;
originally announced January 2024.
-
TransportationGames: Benchmarking Transportation Knowledge of (Multimodal) Large Language Models
Authors:
Xue Zhang,
Xiangyu Shi,
Xinyue Lou,
Rui Qi,
Yufeng Chen,
**an Xu,
Wenjuan Han
Abstract:
Large language models (LLMs) and multimodal large language models (MLLMs) have shown excellent general capabilities, even exhibiting adaptability in many professional domains such as law, economics, transportation, and medicine. Currently, many domain-specific benchmarks have been proposed to verify the performance of (M)LLMs in specific fields. Among various domains, transportation plays a crucia…
▽ More
Large language models (LLMs) and multimodal large language models (MLLMs) have shown excellent general capabilities, even exhibiting adaptability in many professional domains such as law, economics, transportation, and medicine. Currently, many domain-specific benchmarks have been proposed to verify the performance of (M)LLMs in specific fields. Among various domains, transportation plays a crucial role in modern society as it impacts the economy, the environment, and the quality of life for billions of people. However, it is unclear how much traffic knowledge (M)LLMs possess and whether they can reliably perform transportation-related tasks. To address this gap, we propose TransportationGames, a carefully designed and thorough evaluation benchmark for assessing (M)LLMs in the transportation domain. By comprehensively considering the applications in real-world scenarios and referring to the first three levels in Bloom's Taxonomy, we test the performance of various (M)LLMs in memorizing, understanding, and applying transportation knowledge by the selected tasks. The experimental results show that although some models perform well in some tasks, there is still much room for improvement overall. We hope the release of TransportationGames can serve as a foundation for future research, thereby accelerating the implementation and application of (M)LLMs in the transportation domain.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient
Authors:
Xingzhou Lou,
Junge Zhang,
Timothy J. Norman,
Kaiqi Huang,
Yali Du
Abstract:
Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation…
▽ More
Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.
△ Less
Submitted 15 January, 2024; v1 submitted 25 December, 2023;
originally announced December 2023.
-
Adversarial Object Rearrangement in Constrained Environments with Heterogeneous Graph Neural Networks
Authors:
Xibai Lou,
Houjian Yu,
Ross Worobel,
Yang Yang,
Changhyun Choi
Abstract:
Adversarial object rearrangement in the real world (e.g., previously unseen or oversized items in kitchens and stores) could benefit from understanding task scenes, which inherently entail heterogeneous components such as current objects, goal objects, and environmental constraints. The semantic relationships among these components are distinct from each other and crucial for multi-skilled robots…
▽ More
Adversarial object rearrangement in the real world (e.g., previously unseen or oversized items in kitchens and stores) could benefit from understanding task scenes, which inherently entail heterogeneous components such as current objects, goal objects, and environmental constraints. The semantic relationships among these components are distinct from each other and crucial for multi-skilled robots to perform efficiently in everyday scenarios. We propose a hierarchical robotic manipulation system that learns the underlying relationships and maximizes the collaborative power of its diverse skills (e.g., pick-place, push) for rearranging adversarial objects in constrained environments. The high-level coordinator employs a heterogeneous graph neural network (HetGNN), which reasons about the current objects, goal objects, and environmental constraints; the low-level 3D Convolutional Neural Network-based actors execute the action primitives. Our approach is trained entirely in simulation, and achieved an average success rate of 87.88% and a planning cost of 12.82 in real-world experiments, surpassing all baseline methods. Supplementary material is available at https://sites.google.com/umn.edu/versatile-rearrangement.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Neural Network-Based Histologic Remission Prediction In Ulcerative Colitis
Authors:
Yemin li,
Zhongcheng Liu,
Xiaoying Lou,
Mirigual Kurban,
Miao Li,
Jie Yang,
Kaiwei Che,
Jiankun Wang,
Max Q. -H Meng,
Yan Huang,
Qin Guo,
Pin** Hu
Abstract:
BACKGROUND & AIMS: Histological remission (HR) is advocated and considered as a new therapeutic target in ulcerative colitis (UC). Diagnosis of histologic remission currently relies on biopsy; during this process, patients are at risk for bleeding, infection, and post-biopsy fibrosis. In addition, histologic response scoring is complex and time-consuming, and there is heterogeneity among pathologi…
▽ More
BACKGROUND & AIMS: Histological remission (HR) is advocated and considered as a new therapeutic target in ulcerative colitis (UC). Diagnosis of histologic remission currently relies on biopsy; during this process, patients are at risk for bleeding, infection, and post-biopsy fibrosis. In addition, histologic response scoring is complex and time-consuming, and there is heterogeneity among pathologists. Endocytoscopy (EC) is a novel ultra-high magnification endoscopic technique that can provide excellent in vivo assessment of glands. Based on the EC technique, we propose a neural network model that can assess histological disease activity in UC using EC images to address the above issues. The experiment results demonstrate that the proposed method can assist patients in precise treatment and prognostic assessment.
METHODS: We construct a neural network model for UC evaluation. A total of 5105 images of 154 intestinal segments from 87 patients undergoing EC treatment at a center in China between March 2022 and March 2023 are scored according to the Geboes score. Subsequently, 103 intestinal segments are used as the training set, 16 intestinal segments are used as the validation set for neural network training, and the remaining 35 intestinal segments are used as the test set to measure the model performance together with the validation set.
RESULTS: By treating HR as a negative category and histologic activity as a positive category, the proposed neural network model can achieve an accuracy of 0.9, a specificity of 0.95, a sensitivity of 0.75, and an area under the curve (AUC) of 0.81.
CONCLUSION: We develop a specific neural network model that can distinguish histologic remission/activity in EC images of UC, which helps to accelerate clinical histological diagnosis.
keywords: ulcerative colitis; Endocytoscopy; Geboes score; neural network.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
IOSG: Image-driven Object Searching and Gras**
Authors:
Houjian Yu,
Xibai Lou,
Yang Yang,
Changhyun Choi
Abstract:
When robots retrieve specific objects from cluttered scenes, such as home and warehouse environments, the target objects are often partially occluded or completely hidden. Robots are thus required to search, identify a target object, and successfully grasp it. Preceding works have relied on pre-trained object recognition or segmentation models to find the target object. However, such methods requi…
▽ More
When robots retrieve specific objects from cluttered scenes, such as home and warehouse environments, the target objects are often partially occluded or completely hidden. Robots are thus required to search, identify a target object, and successfully grasp it. Preceding works have relied on pre-trained object recognition or segmentation models to find the target object. However, such methods require laborious manual annotations to train the models and even fail to find novel target objects. In this paper, we propose an Image-driven Object Searching and Gras** (IOSG) approach where a robot is provided with the reference image of a novel target object and tasked to find and retrieve it. We design a Target Similarity Network that generates a probability map to infer the location of the novel target. IOSG learns a hierarchical policy; the high-level policy predicts the subtask type, whereas the low-level policies, explorer and coordinator, generate effective push and grasp actions. The explorer is responsible for searching the target object when it is hidden or occluded by other objects. Once the target object is found, the coordinator conducts target-oriented pushing and gras** to retrieve the target from the clutter. The proposed pipeline is trained with full self-supervision in simulation and applied to a real environment. Our model achieves a 96.0% and 94.5% task success rate on coordination and exploration tasks in simulation respectively, and 85.0% success rate on a real robot for the search-and-grasp task.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Mercury: An Automated Remote Side-channel Attack to Nvidia Deep Learning Accelerator
Authors:
Xiaobei Yan,
Xiaoxuan Lou,
Guowen Xu,
Han Qiu,
Shangwei Guo,
Chip Hong Chang,
Tianwei Zhang
Abstract:
DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extractio…
▽ More
DNN accelerators have been widely deployed in many scenarios to speed up the inference process and reduce the energy consumption. One big concern about the usage of the accelerators is the confidentiality of the deployed models: model inference execution on the accelerators could leak side-channel information, which enables an adversary to preciously recover the model details. Such model extraction attacks can not only compromise the intellectual property of DNN models, but also facilitate some adversarial attacks.
Although previous works have demonstrated a number of side-channel techniques to extract models from DNN accelerators, they are not practical for two reasons. (1) They only target simplified accelerator implementations, which have limited practicality in the real world. (2) They require heavy human analysis and domain knowledge. To overcome these limitations, this paper presents Mercury, the first automated remote side-channel attack against the off-the-shelf Nvidia DNN accelerator. The key insight of Mercury is to model the side-channel extraction process as a sequence-to-sequence problem. The adversary can leverage a time-to-digital converter (TDC) to remotely collect the power trace of the target model's inference. Then he uses a learning model to automatically recover the architecture details of the victim model from the power trace without any prior knowledge. The adversary can further use the attention mechanism to localize the leakage points that contribute most to the attack. Evaluation results indicate that Mercury can keep the error rate of model extraction below 1%.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Identifying Symptoms of Delirium from Clinical Narratives Using Natural Language Processing
Authors:
Aokun Chen,
Daniel Paredes,
Zehao Yu,
Xiwei Lou,
Roberta Brunson,
Jamie N. Thomas,
Kimberly A. Martinez,
Robert J. Lucero,
Tanja Magoc,
Laurence M. Solberg,
Urszula A. Snigurska,
Sarah E. Ser,
Mattia Prosperi,
Jiang Bian,
Ragnhildur I. Bjarnadottir,
Yonghui Wu
Abstract:
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key technology that extracts medical concepts from clinical…
▽ More
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key technology that extracts medical concepts from clinical narratives, has shown great potential in studies of delirium outcomes and symptoms. To assist in the diagnosis and phenoty** of delirium, we formed an expert panel to categorize diverse delirium symptoms, composed annotation guidelines, created a delirium corpus with diverse delirium symptoms, and developed NLP methods to extract delirium symptoms from clinical notes. We compared 5 state-of-the-art transformer models including 2 models (BERT and RoBERTa) from the general domain and 3 models (BERT_MIMIC, RoBERTa_MIMIC, and GatorTron) from the clinical domain. GatorTron achieved the best strict and lenient F1 scores of 0.8055 and 0.8759, respectively. We conducted an error analysis to identify challenges in annotating delirium symptoms and develo** NLP systems. To the best of our knowledge, this is the first large language model-based delirium symptom extraction system. Our study lays the foundation for the future development of computable phenotypes and diagnosis methods for delirium.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Structure Embedded Nucleus Classification for Histopathology Images
Authors:
Wei Lou,
Xiang Wan,
Guanbin Li,
Xiaoying Lou,
Chenghang Li,
Feng Gao,
Haofeng Li
Abstract:
Nuclei classification provides valuable information for histopathology image analysis. However, the large variations in the appearance of different nuclei types cause difficulties in identifying nuclei. Most neural network based methods are affected by the local receptive field of convolutions, and pay less attention to the spatial distribution of nuclei or the irregular contour shape of a nucleus…
▽ More
Nuclei classification provides valuable information for histopathology image analysis. However, the large variations in the appearance of different nuclei types cause difficulties in identifying nuclei. Most neural network based methods are affected by the local receptive field of convolutions, and pay less attention to the spatial distribution of nuclei or the irregular contour shape of a nucleus. In this paper, we first propose a novel polygon-structure feature learning mechanism that transforms a nucleus contour into a sequence of points sampled in order, and employ a recurrent neural network that aggregates the sequential change in distance between key points to obtain learnable shape features. Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations. To capture the correlations between the categories of nuclei and their surrounding tissue patterns, we further introduce edge features that are defined as the background textures between adjacent nuclei. Lastly, we integrate both polygon and graph structure learning mechanisms into a whole framework that can extract intra and inter-nucleus structural characteristics for nuclei classification. Experimental results show that the proposed framework achieves significant improvements compared to the state-of-the-art methods.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
PushWorld: A benchmark for manipulation planning with tools and movable obstacles
Authors:
Ken Kansky,
Skanda Vaidyanath,
Scott Swingle,
Xinghua Lou,
Miguel Lazaro-Gredilla,
Dileep George
Abstract:
While recent advances in artificial intelligence have achieved human-level performance in environments like Starcraft and Go, many physical reasoning tasks remain challenging for modern algorithms. To date, few algorithms have been evaluated on physical tasks that involve manipulating objects when movable obstacles are present and when tools must be used to perform the manipulation. To promote res…
▽ More
While recent advances in artificial intelligence have achieved human-level performance in environments like Starcraft and Go, many physical reasoning tasks remain challenging for modern algorithms. To date, few algorithms have been evaluated on physical tasks that involve manipulating objects when movable obstacles are present and when tools must be used to perform the manipulation. To promote research on such tasks, we introduce PushWorld, an environment with simplistic physics that requires manipulation planning with both movable obstacles and tools. We provide a benchmark of more than 200 PushWorld puzzles in PDDL and in an OpenAI Gym environment. We evaluate state-of-the-art classical planning and reinforcement learning algorithms on this benchmark, and we find that these baseline results are below human-level performance. We then provide a new classical planning heuristic that solves the most puzzles among the baselines, and although it is 40 times faster than the best baseline planner, it remains below human-level performance.
△ Less
Submitted 1 February, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination
Authors:
Xingzhou Lou,
Jiaxian Guo,
Junge Zhang,
Jun Wang,
Kaiqi Huang,
Yali Du
Abstract:
Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Cu…
▽ More
Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies.
△ Less
Submitted 22 May, 2023; v1 submitted 16 January, 2023;
originally announced January 2023.
-
Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
Authors:
Andrey Ignatov,
Radu Timofte,
Shuai Liu,
Chaoyu Feng,
Furui Bai,
Xiaotao Wang,
Lei Lei,
Ziyao Yi,
Yan Xiang,
Zibin Liu,
Shaoqing Li,
Keming Shi,
Dehui Kong,
Ke Xu,
Minsu Kwon,
Yaqi Wu,
Jiesi Zheng,
Zhihao Fan,
Xun Wu,
Feng Zhang,
Albert No,
Minhyeok Cho,
Zewen Chen,
Xiaze Zhang,
Ran Li
, et al. (13 additional authors not shown)
Abstract:
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. Th…
▽ More
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
ShiftNAS: Towards Automatic Generation of Advanced Mulitplication-Less Neural Networks
Authors:
Xiaoxuan Lou,
Guowen Xu,
Kangjie Chen,
Guanlin Li,
Jiwei Li,
Tianwei Zhang
Abstract:
Multiplication-less neural networks significantly reduce the time and energy cost on the hardware platform, as the compute-intensive multiplications are replaced with lightweight bit-shift operations. However, existing bit-shift networks are all directly transferred from state-of-the-art convolutional neural networks (CNNs), which lead to non-negligible accuracy drop or even failure of model conve…
▽ More
Multiplication-less neural networks significantly reduce the time and energy cost on the hardware platform, as the compute-intensive multiplications are replaced with lightweight bit-shift operations. However, existing bit-shift networks are all directly transferred from state-of-the-art convolutional neural networks (CNNs), which lead to non-negligible accuracy drop or even failure of model convergence. To combat this, we propose ShiftNAS, the first framework tailoring Neural Architecture Search (NAS) to substantially reduce the accuracy gap between bit-shift neural networks and their real-valued counterparts. Specifically, we pioneer dragging NAS into a shift-oriented search space and endow it with the robust topology-related search strategy and custom regularization and stabilization. As a result, our ShiftNAS breaks through the incompatibility of traditional NAS methods for bit-shift neural networks and achieves more desirable performance in terms of accuracy and convergence. Extensive experiments demonstrate that ShiftNAS sets a new state-of-the-art for bit-shift neural networks, where the accuracy increases (1.69-8.07)% on CIFAR10, (5.71-18.09)% on CIFAR100 and (4.36-67.07)% on ImageNet, especially when many conventional CNNs fail to converge on ImageNet with bit-shift weights.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Interactive Robotic Gras** with Attribute-Guided Disambiguation
Authors:
Yang Yang,
Xibai Lou,
Changhyun Choi
Abstract:
Interactive robotic gras** using natural language is one of the most fundamental tasks in human-robot interaction. However, language can be a source of ambiguity, particularly when there are ambiguous visual or linguistic contents. This paper investigates the use of object attributes in disambiguation and develops an interactive gras** system capable of effectively resolving ambiguities via di…
▽ More
Interactive robotic gras** using natural language is one of the most fundamental tasks in human-robot interaction. However, language can be a source of ambiguity, particularly when there are ambiguous visual or linguistic contents. This paper investigates the use of object attributes in disambiguation and develops an interactive gras** system capable of effectively resolving ambiguities via dialogues. Our approach first predicts target scores and attribute scores through vision-and-language grounding. To handle ambiguous objects and commands, we propose an attribute-guided formulation of the partially observable Markov decision process (Attr-POMDP) for disambiguation. The Attr-POMDP utilizes target and attribute scores as the observation model to calculate the expected return of an attribute-based (e.g., "what is the color of the target, red or green?") or a pointing-based (e.g., "do you mean this one?") question. Our disambiguation module runs in real time on a real robot, and the interactive gras** system achieves a 91.43\% selection accuracy in the real-robot experiments, outperforming several baselines by large margins.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
ICARUS: A Specialized Architecture for Neural Radiance Fields Rendering
Authors:
Chaolin Rao,
Huangjie Yu,
Haochuan Wan,
**dong Zhou,
Yueyang Zheng,
Yu Ma,
Anpei Chen,
Minye Wu,
Binzhe Yuan,
**qiang Zhou,
Xin Lou,
**gyi Yu
Abstract:
The practical deployment of Neural Radiance Fields (NeRF) in rendering applications faces several challenges, with the most critical one being low rendering speed on even high-end graphic processing units (GPUs). In this paper, we present ICARUS, a specialized accelerator architecture tailored for NeRF rendering. Unlike GPUs using general purpose computing and memory architectures for NeRF, ICARUS…
▽ More
The practical deployment of Neural Radiance Fields (NeRF) in rendering applications faces several challenges, with the most critical one being low rendering speed on even high-end graphic processing units (GPUs). In this paper, we present ICARUS, a specialized accelerator architecture tailored for NeRF rendering. Unlike GPUs using general purpose computing and memory architectures for NeRF, ICARUS executes the complete NeRF pipeline using dedicated plenoptic cores (PLCore) consisting of a positional encoding unit (PEU), a multi-layer perceptron (MLP) engine, and a volume rendering unit (VRU). A PLCore takes in positions \& directions and renders the corresponding pixel colors without any intermediate data going off-chip for temporary storage and exchange, which can be time and power consuming. To implement the most expensive component of NeRF, i.e., the MLP, we transform the fully connected operations to approximated reconfigurable multiple constant multiplications (MCMs), where common subexpressions are shared across different multiplications to improve the computation efficiency. We build a prototype ICARUS using Synopsys HAPS-80 S104, a field programmable gate array (FPGA)-based prototy** system for large-scale integrated circuits and systems design. We evaluate the power-performance-area (PPA) of a PLCore using 40nm LP CMOS technology. Working at 400 MHz, a single PLCore occupies 16.5 $mm^2$ and consumes 282.8 mW, translating to 0.105 uJ/sample. The results are compared with those of GPU and tensor processing unit (TPU) implementations.
△ Less
Submitted 26 September, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
Learning Object Relations with Graph Neural Networks for Target-Driven Gras** in Dense Clutter
Authors:
Xibai Lou,
Yang Yang,
Changhyun Choi
Abstract:
Robots in the real world frequently come across identical objects in dense clutter. When evaluating grasp poses in these scenarios, a target-driven gras** system requires knowledge of spatial relations between scene objects (e.g., proximity, adjacency, and occlusions). To efficiently complete this task, we propose a target-driven gras** system that simultaneously considers object relations and…
▽ More
Robots in the real world frequently come across identical objects in dense clutter. When evaluating grasp poses in these scenarios, a target-driven gras** system requires knowledge of spatial relations between scene objects (e.g., proximity, adjacency, and occlusions). To efficiently complete this task, we propose a target-driven gras** system that simultaneously considers object relations and predicts 6-DoF grasp poses. A densely cluttered scene is first formulated as a grasp graph with nodes representing object geometries in the grasp coordinate frame and edges indicating spatial relations between the objects. We design a Grasp Graph Neural Network (G2N2) that evaluates the grasp graph and finds the most feasible 6-DoF grasp pose for a target object. Additionally, we develop a shape completion-assisted grasp pose sampling method that improves sample quality and consequently gras** efficiency. We compare our method against several baselines in both simulated and real settings. In real-world experiments with novel objects, our approach achieves a 77.78% gras** accuracy in densely cluttered scenarios, surpassing the best-performing baseline by more than 15%. Supplementary material is available at https://sites.google.com/umn.edu/graph-gras**.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Raw Bayer Pattern Image Synthesis for Computer Vision-oriented Image Signal Processing Pipeline Design
Authors:
Wei Zhou,
Xiangyu Zhang,
Hongyu Wang,
Shenghua Gao,
Xin Lou
Abstract:
In this paper, we propose a method to add constraints that are un-formulatable in generative adversarial networks (GAN)-based arbitrary size RAW Bayer image generation. It is shown theoretically that by using the transformed data in GAN training, it is able to improve the learning of the original data distribution, owing to the invariant of Jensen-Shannon (JS) divergence between two distributions…
▽ More
In this paper, we propose a method to add constraints that are un-formulatable in generative adversarial networks (GAN)-based arbitrary size RAW Bayer image generation. It is shown theoretically that by using the transformed data in GAN training, it is able to improve the learning of the original data distribution, owing to the invariant of Jensen-Shannon (JS) divergence between two distributions under invertible and differentiable transformation. Benefiting from the proposed method, RAW Bayer pattern images can be generated by configuring the transformation as demosaicing. It is shown that by adding another transformation, the proposed method is able to synthesize high-quality RAW Bayer images with arbitrary size. Experimental results show that images generated by the proposed method outperform the existing methods in the Fréchet inception distance (FID) score, peak signal to noise ratio (PSNR), and mean structural similarity (MSSIM), and the training process is more stable. To the best knowledge of the authors, there is no open-source, large-scale image dataset in the RAW Bayer domain, which is crucial for research works aiming to explore the image signal processing (ISP) pipeline design for computer vision tasks. Converting the existing commonly used color image datasets to their corresponding RAW Bayer versions, the proposed method can be a promising solution to the RAW image dataset problem. We also show in the experiments that, by training object detection frameworks using the synthesized RAW Bayer images, they can be used in an end-to-end manner (from RAW images to vision tasks) with negligible performance degradation.
△ Less
Submitted 15 December, 2021; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Learning Linear Polytree Structural Equation Models
Authors:
Xingmei Lou,
Yu Hu,
Xiaodong Li
Abstract:
We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of…
▽ More
We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Under the Gaussian polytree models, we study sufficient conditions on the sample sizes for the well-known Chow-Liu algorithm to exactly recover both the skeleton and the equivalence class of the polytree, which is uniquely represented by a CPDAG. On the other hand, necessary conditions on the required sample sizes for both skeleton and CPDAG recovery are also derived in terms of information-theoretic lower bounds, which match the respective sufficient conditions and thereby give a sharp characterization of the difficulty of these tasks. We also consider the problem of inverse correlation matrix estimation under the linear polytree models, and establish the estimation error bound in terms of the dimension and the total number of v-structures. We also consider an extension of group linear polytree models, in which each node represents a group of variables. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of polytree learning when the true graphical structures can only be approximated by polytrees.
△ Less
Submitted 14 May, 2024; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Attribute-Based Robotic Gras** with One-Grasp Adaptation
Authors:
Yang Yang,
Yuanhao Liu,
Hengyue Liang,
Xibai Lou,
Changhyun Choi
Abstract:
Robotic gras** is one of the most fundamental robotic manipulation tasks and has been actively studied. However, how to quickly teach a robot to grasp a novel target object in clutter remains challenging. This paper attempts to tackle the challenge by leveraging object attributes that facilitate recognition, gras**, and quick adaptation. In this work, we introduce an end-to-end learning method…
▽ More
Robotic gras** is one of the most fundamental robotic manipulation tasks and has been actively studied. However, how to quickly teach a robot to grasp a novel target object in clutter remains challenging. This paper attempts to tackle the challenge by leveraging object attributes that facilitate recognition, gras**, and quick adaptation. In this work, we introduce an end-to-end learning method of attribute-based robotic gras** with one-grasp adaptation capability. Our approach fuses the embeddings of a workspace image and a query text using a gated-attention mechanism and learns to predict instance gras** affordances. Besides, we utilize object persistence before and after gras** to learn a joint metric space of visual and textual attributes. Our model is self-supervised in a simulation that only uses basic objects of various colors and shapes but generalizes to novel objects and real-world scenes. We further demonstrate that our model is capable of adapting to novel objects with only one grasp data and improving instance gras** performance significantly. Experimental results in both simulation and the real world demonstrate that our approach achieves over 80\% instance gras** success rate on unknown objects, which outperforms several baselines by large margins.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Collision-Aware Target-Driven Object Gras** in Constrained Environments
Authors:
Xibai Lou,
Yang Yang,
Changhyun Choi
Abstract:
Gras** a novel target object in constrained environments (e.g., walls, bins, and shelves) requires intensive reasoning about grasp pose reachability to avoid collisions with the surrounding structures. Typical 6-DoF robotic gras** systems rely on the prior knowledge about the environment and intensive planning computation, which is ungeneralizable and inefficient. In contrast, we propose a nov…
▽ More
Gras** a novel target object in constrained environments (e.g., walls, bins, and shelves) requires intensive reasoning about grasp pose reachability to avoid collisions with the surrounding structures. Typical 6-DoF robotic gras** systems rely on the prior knowledge about the environment and intensive planning computation, which is ungeneralizable and inefficient. In contrast, we propose a novel Collision-Aware Reachability Predictor (CARP) for 6-DoF gras** systems. The CARP learns to estimate the collision-free probabilities for grasp poses and significantly improves gras** in challenging environments. The deep neural networks in our approach are trained fully by self-supervision in simulation. The experiments in both simulation and the real world show that our approach achieves more than 75% gras** rate on novel objects in various surrounding structures. The ablation study demonstrates the effectiveness of the CARP, which improves the 6-DoF gras** rate by 95.7%.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
A Survey of Microarchitectural Side-channel Vulnerabilities, Attacks and Defenses in Cryptography
Authors:
Xiaoxuan Lou,
Tianwei Zhang,
Jun Jiang,
Yinqian Zhang
Abstract:
Side-channel attacks have become a severe threat to the confidentiality of computer applications and systems. One popular type of such attacks is the microarchitectural attack, where the adversary exploits the hardware features to break the protection enforced by the operating system and steal the secrets from the program. In this paper, we systematize microarchitectural side channels with a focus…
▽ More
Side-channel attacks have become a severe threat to the confidentiality of computer applications and systems. One popular type of such attacks is the microarchitectural attack, where the adversary exploits the hardware features to break the protection enforced by the operating system and steal the secrets from the program. In this paper, we systematize microarchitectural side channels with a focus on attacks and defenses in cryptographic applications. We make three contributions. (1) We survey past research literature to categorize microarchitectural side-channel attacks. Since these are hardware attacks targeting software, we summarize the vulnerable implementations in software, as well as flawed designs in hardware. (2) We identify common strategies to mitigate microarchitectural attacks, from the application, OS and hardware levels. (3) We conduct a large-scale evaluation on popular cryptographic applications in the real world, and analyze the severity, practicality and impact of side-channel vulnerabilities. This survey is expected to inspire side-channel research community to discover new attacks, and more importantly, propose new defense solutions against them.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Ownership Verification of DNN Architectures via Hardware Cache Side Channels
Authors:
Xiaoxuan Lou,
Shangwei Guo,
Jiwei Li,
Tianwei Zhang
Abstract:
Deep Neural Networks (DNN) are gaining higher commercial values in computer vision applications, e.g., image classification, video analytics, etc. This calls for urgent demands of the intellectual property (IP) protection of DNN models. In this paper, we present a novel watermarking scheme to achieve the ownership verification of DNN architectures. Existing works all embedded watermarks into the m…
▽ More
Deep Neural Networks (DNN) are gaining higher commercial values in computer vision applications, e.g., image classification, video analytics, etc. This calls for urgent demands of the intellectual property (IP) protection of DNN models. In this paper, we present a novel watermarking scheme to achieve the ownership verification of DNN architectures. Existing works all embedded watermarks into the model parameters while treating the architecture as public property. These solutions were proven to be vulnerable by an adversary to detect or remove the watermarks. In contrast, we claim the model architectures as an important IP for model owners, and propose to implant watermarks into the architectures. We design new algorithms based on Neural Architecture Search (NAS) to generate watermarked architectures, which are unique enough to represent the ownership, while maintaining high model usability. Such watermarks can be extracted via side-channel-based model extraction techniques with high fidelity. We conduct comprehensive experiments on watermarked CNN models for image classification tasks and the experimental results show our scheme has negligible impact on the model performance, and exhibits strong robustness against various model transformations and adaptive attacks.
△ Less
Submitted 28 June, 2022; v1 submitted 6 February, 2021;
originally announced February 2021.
-
On Lightweight Privacy-Preserving Collaborative Learning for Internet of Things by Independent Random Projections
Authors:
Linshan Jiang,
Rui Tan,
Xin Lou,
Guosheng Lin
Abstract:
The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidential…
▽ More
The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidentiality of the raw forms of the training data is protected against the coordinator. Existing distributed machine learning and data encryption approaches incur significant computation and communication overhead, rendering them ill-suited for resource-constrained IoT objects. We study an approach that applies independent random projection at each IoT object to obfuscate data and trains a deep neural network at the coordinator based on the projected data from the IoT objects. This approach introduces light computation overhead to the IoT objects and moves most workload to the coordinator that can have sufficient computing resources. Although the independent projections performed by the IoT objects address the potential collusion between the curious coordinator and some compromised IoT objects, they significantly increase the complexity of the projected data. In this paper, we leverage the superior learning capability of deep learning in capturing sophisticated patterns to maintain good learning performance. The extensive comparative evaluation shows that this approach outperforms other lightweight approaches that apply additive noisification for differential privacy and/or support vector machines for learning in the applications with light to moderate data pattern complexities.
△ Less
Submitted 11 December, 2020;
originally announced December 2020.
-
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Authors:
Prakhar Ganesh,
Yao Chen,
Xin Lou,
Mohammad Ali Khan,
Yin Yang,
Hassan Sajjad,
Preslav Nakov,
Deming Chen,
Marianne Winslett
Abstract:
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy for this is model compression, which has attrac…
▽ More
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy for this is model compression, which has attracted a lot of research attention. Here, we summarize the research in compressing Transformers, focusing on the especially popular BERT model. In particular, we survey the state of the art in compression for BERT, we clarify the current best practices for compressing large-scale Transformer models, and we provide insights into the workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving lightweight, accurate, and generic NLP models.
△ Less
Submitted 1 June, 2021; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Learning to Generate 6-DoF Grasp Poses with Reachability Awareness
Authors:
Xibai Lou,
Yang Yang,
Changhyun Choi
Abstract:
Motivated by the stringent requirements of unstructured real-world where a plethora of unknown objects reside in arbitrary locations of the surface, we propose a voxel-based deep 3D Convolutional Neural Network (3D CNN) that generates feasible 6-DoF grasp poses in unrestricted workspace with reachability awareness. Unlike the majority of works that predict if a proposed grasp pose within the restr…
▽ More
Motivated by the stringent requirements of unstructured real-world where a plethora of unknown objects reside in arbitrary locations of the surface, we propose a voxel-based deep 3D Convolutional Neural Network (3D CNN) that generates feasible 6-DoF grasp poses in unrestricted workspace with reachability awareness. Unlike the majority of works that predict if a proposed grasp pose within the restricted workspace will be successful solely based on grasp pose stability, our approach further learns a reachability predictor that evaluates if the grasp pose is reachable or not from robot's own experience. To avoid the laborious real training data collection, we exploit the power of simulation to train our networks on a large-scale synthetic dataset. This work is an early attempt that simultaneously evaluates gras** reachability from learned knowledge while proposing feasible grasp poses with 3D CNN. Experimental results in both simulation and real-world demonstrate that our approach outperforms several other methods and achieves 82.5% gras** success rate on unknown objects.
△ Less
Submitted 30 September, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Link Prediction via Graph Attention Network
Authors:
Weiwei Gu,
Fei Gao,
Xiaodan Lou,
Jiang Zhang
Abstract:
Link prediction aims to infer missing links or predicting the future ones based on currently observed partial networks, it is a fundamental problem in network science with tremendous real-world applications. However, conventional link prediction approaches neither have high prediction accuracy nor being capable of revealing the hidden information behind links. To address this problem, we generaliz…
▽ More
Link prediction aims to infer missing links or predicting the future ones based on currently observed partial networks, it is a fundamental problem in network science with tremendous real-world applications. However, conventional link prediction approaches neither have high prediction accuracy nor being capable of revealing the hidden information behind links. To address this problem, we generalize the latest techniques in deep learning on graphs and present a new link prediction model - DeepLinker. Instead of learning node representation with the node label information, DeepLinker uses the links as supervised information. Experiments on five graphs show that DeepLinker can not only achieve the state-of-the-art link prediction accuracy, but also acquire the efficient node representations and node centrality ranking as the byproducts. Although the representations are obtained without any supervised node label information, they still perform well on node ranking and node classification tasks.
△ Less
Submitted 29 October, 2019; v1 submitted 10 October, 2019;
originally announced October 2019.
-
Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures
Authors:
Hengyue Liang,
Xibai Lou,
Yang Yang,
Changhyun Choi
Abstract:
This paper introduces a challenging object gras** task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall gras** task assumes no prior knowledge except the partial observation of a target object…
▽ More
This paper introduces a challenging object gras** task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall gras** task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.
△ Less
Submitted 2 April, 2021; v1 submitted 9 October, 2019;
originally announced October 2019.
-
Quantifying the Vulnerabilities of the Online Public Square to Adversarial Manipulation Tactics
Authors:
Bao Tran Truong,
Xiaodan Lou,
Alessandro Flammini,
Filippo Menczer
Abstract:
Social media, seen by some as the modern public square, is vulnerable to manipulation. By controlling inauthentic accounts impersonating humans, malicious actors can amplify disinformation within target communities. The consequences of such operations are difficult to evaluate due to the challenges posed by collecting data and carrying out ethical experiments that would influence online communitie…
▽ More
Social media, seen by some as the modern public square, is vulnerable to manipulation. By controlling inauthentic accounts impersonating humans, malicious actors can amplify disinformation within target communities. The consequences of such operations are difficult to evaluate due to the challenges posed by collecting data and carrying out ethical experiments that would influence online communities. Here we use a social media model that simulates information diffusion in an empirical network to quantify the impacts of several adversarial manipulation tactics on the quality of content. We find that the presence of influential accounts, a hallmark of social media, exacerbates the vulnerabilities of online communities to manipulation. Among the explored tactics that bad actors can employ, infiltrating a community is the most likely to make low-quality content go viral. Such harm can be further compounded by inauthentic agents flooding the network with low-quality, yet appealing content, but is mitigated when bad actors focus on specific targets, such as influential or vulnerable individuals. These insights suggest countermeasures that platforms could employ to increase the resilience of social media users to manipulation.
△ Less
Submitted 11 June, 2024; v1 submitted 13 July, 2019;
originally announced July 2019.
-
Qualifying threshold of take off stage for successfully disseminated creative ideas
Authors:
Guoqiang Liang,
Xiaodan Lou,
Haiyan Hou,
Zhigang Hu
Abstract:
The creative process is essentially Darwinian and only a small proportion of creative ideas are selected for further development. However, the threshold that identifies this small fraction of successfully disseminated creative ideas at their early stage has not been thoroughly analyzed through the lens of Rogers innovation diffusion theory. Here, we take highly cited (top 1%) research papers as an…
▽ More
The creative process is essentially Darwinian and only a small proportion of creative ideas are selected for further development. However, the threshold that identifies this small fraction of successfully disseminated creative ideas at their early stage has not been thoroughly analyzed through the lens of Rogers innovation diffusion theory. Here, we take highly cited (top 1%) research papers as an example of the most successfully disseminated creative ideas and explore the time it takes and citations it receives at their take off stage, which play a crucial role in the dissemination of creativity. Results show the majority of highly cited papers will reach 10% and 25% of their total citations within two years and four years, respectively. Interestingly, our results also present a minimal number of articles that attract their first citation before publication. As for the discipline, number of references, and Price index, we find a significant difference exists: Clinical, Pre-Clinical & Health and Life Sciences are the first two disciplines to reach the C10% and C25% in a shorter amount of time. Highly cited papers with limited references usually take more time to reach 10% and 25% of their total citations. In addition, highly cited papers will attract citations rapidly when they cite more recent references. These results provide insights into the timespan and citations for a research paper to become highly cited at the take off stage in its diffusion process, as well as the factors that may influence it.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Residual Pyramid Learning for Single-Shot Semantic Segmentation
Authors:
Xiaoyu Chen,
Xiaotian Lou,
Lianfa Bai,
**g Han
Abstract:
Pixel-level semantic segmentation is a challenging task with a huge amount of computation, especially if the size of input is large. In the segmentation model, apart from the feature extraction, the extra decoder structure is often employed to recover spatial information. In this paper, we put forward a method for single-shot segmentation in a feature residual pyramid network (RPNet), which learns…
▽ More
Pixel-level semantic segmentation is a challenging task with a huge amount of computation, especially if the size of input is large. In the segmentation model, apart from the feature extraction, the extra decoder structure is often employed to recover spatial information. In this paper, we put forward a method for single-shot segmentation in a feature residual pyramid network (RPNet), which learns the main and residuals of segmentation by decomposing the label at different levels of residual blocks. Specifically speaking, we use the residual features to learn the edges and details, and the identity features to learn the main part of targets. At testing time, the predicted residuals are used to enhance the details of the top-level prediction. Residual learning blocks split the network into several shallow sub-networks which facilitates the training of the RPNet. We then evaluate the proposed method and compare it with recent state-of-the-art methods on CamVid and Cityscapes. The proposed single-shot segmentation based on RPNet achieves impressive results with high efficiency on pixel-level segmentation.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
On Lightweight Privacy-Preserving Collaborative Learning for IoT Objects
Authors:
Linshan Jiang,
Rui Tan,
Xin Lou,
Guosheng Lin
Abstract:
The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidential…
▽ More
The Internet of Things (IoT) will be a main data generation infrastructure for achieving better system intelligence. This paper considers the design and implementation of a practical privacy-preserving collaborative learning scheme, in which a curious learning coordinator trains a better machine learning model based on the data samples contributed by a number of IoT objects, while the confidentiality of the raw forms of the training data is protected against the coordinator. Existing distributed machine learning and data encryption approaches incur significant computation and communication overhead, rendering them ill-suited for resource-constrained IoT objects. We study an approach that applies independent Gaussian random projection at each IoT object to obfuscate data and trains a deep neural network at the coordinator based on the projected data from the IoT objects. This approach introduces light computation overhead to the IoT objects and moves most workload to the coordinator that can have sufficient computing resources. Although the independent projections performed by the IoT objects address the potential collusion between the curious coordinator and some compromised IoT objects, they significantly increase the complexity of the projected data. In this paper, we leverage the superior learning capability of deep learning in capturing sophisticated patterns to maintain good learning performance. Extensive comparative evaluation shows that this approach outperforms other lightweight approaches that apply additive noisification for differential privacy and/or support vector machines for learning in the applications with light data pattern complexities.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Understanding Rowhammer Attacks through the Lens of a Unified Reference Framework
Authors:
Xiaoxuan Lou,
Fan Zhang,
Zheng Leong Chua,
Zhenkai Liang,
Yueqiang Cheng,
Ya** Zhou
Abstract:
Rowhammer is a hardware-based bug that allows the attacker to modify the data in the memory without accessing it, just repeatedly and frequently accessing (or hammering) physically adjacent memory rows. So that it can break the memory isolation between processes, which is seen as the cornerstone of modern system security, exposing the sensitive data to unauthorized and imperceptible corruption. A…
▽ More
Rowhammer is a hardware-based bug that allows the attacker to modify the data in the memory without accessing it, just repeatedly and frequently accessing (or hammering) physically adjacent memory rows. So that it can break the memory isolation between processes, which is seen as the cornerstone of modern system security, exposing the sensitive data to unauthorized and imperceptible corruption. A number of previous works have leveraged the rowhammer bug to achieve various critical attacks.
In this work, we propose a unified reference framework for analyzing the rowhammer attacks, indicating three necessary factors in a practical rowhammer attack: the attack origin, the intended implication and the methodology. Each factor includes multiple primitives, the attacker can select primitives from three factors to constitute an effective attack. In particular, the methodology further summarizes all existing attack techniques, that are used to achieve its three primitives: Location Preparation (LP), Rapid Hammering (RH), and Exploit Verification (EV). Based on the reference framework, we analyze all previous rowhammer attacks and corresponding countermeasures. Our analysis shows that how primitives in different factors are combined and used in previous attacks, and thus points out new possibility of rowhammer attacks, enabling proactive prevention before it causes harm. Under the framework, we propose a novel expressive rowhammer attack that is capable of accumulating injected memory changes and achieving rich attack semantics. We conclude by outlining future research directions.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
One-Hop Out-of-Band Control Planes for Low-Power Multi-Hop Wireless Networks
Authors:
Chaojie Gu,
Rui Tan,
Xin Lou,
Dusit Niyato
Abstract:
Separation of control and data planes (SCDP) is a desirable paradigm for low-power multi-hop wireless networks requiring high network performance and manageability. Existing SCDP networks generally adopt an in-band control plane scheme in that the control-plane messages are delivered by their data-plane networks. The physical coupling of the two planes may lead to undesirable consequences. To adva…
▽ More
Separation of control and data planes (SCDP) is a desirable paradigm for low-power multi-hop wireless networks requiring high network performance and manageability. Existing SCDP networks generally adopt an in-band control plane scheme in that the control-plane messages are delivered by their data-plane networks. The physical coupling of the two planes may lead to undesirable consequences. To advance the network architecture design, we propose to leverage on the long-range communication capability of the increasingly available low-power wide-area network (LPWAN) radios to form one-hop out-of-band control planes. We choose LoRaWAN, an open, inexpensive, and ISM band based LPWAN radio to prototype our out-of-band control plane called LoRaCP. Several characteristics of LoRaWAN such as downlink-uplink asymmetry and primitive ALOHA media access control (MAC) present challenges to achieving reliability and efficiency. To address these challenges, we design a TDMA-based multi-channel MAC featuring an urgent channel and negative acknowledgment. On a testbed of 16 nodes, we demonstrate applying LoRaCP to physically separate the control-plane network of the Collection Tree Protocol (CTP) from its ZigBee-based data-plane network. Extensive experiments show that LoRaCP increases CTP's packet delivery ratio from 65% to 80% in the presence of external interference, while consuming a per-node average radio power of 2.97mW only.
△ Less
Submitted 16 December, 2017;
originally announced December 2017.
-
Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework
Authors:
Shan Yang,
Lei Xie,
Xiao Chen,
Xiaoyan Lou,
Xuan Zhu,
Dongyan Huang,
Haizhou Li
Abstract:
In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN). In particular, we propose a novel architecture combining the traditional acoustic loss function and the GAN's discriminative loss under a multi-task learning (MTL) framework. The mean squared error (MSE) is usually used to estim…
▽ More
In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN). In particular, we propose a novel architecture combining the traditional acoustic loss function and the GAN's discriminative loss under a multi-task learning (MTL) framework. The mean squared error (MSE) is usually used to estimate the parameters of deep neural networks, which only considers the numerical difference between the raw audio and the synthesized one. To mitigate this problem, we introduce the GAN as a second task to determine if the input is a natural speech with specific conditions. In this MTL framework, the MSE optimization improves the stability of GAN, and at the same time GAN produces samples with a distribution closer to natural speech. Listening tests show that the multi-task architecture can generate more natural speech that satisfies human perception than the conventional methods.
△ Less
Submitted 11 July, 2017; v1 submitted 6 July, 2017;
originally announced July 2017.
-
Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics
Authors:
Ken Kansky,
Tom Silver,
David A. Mély,
Mohamed Eldawy,
Miguel Lázaro-Gredilla,
Xinghua Lou,
Nimrod Dorfman,
Szymon Sidor,
Scott Phoenix,
Dileep George
Abstract:
The recent adaptation of deep neural network-based methods to reinforcement learning and planning domains has yielded remarkable progress on individual tasks. Nonetheless, progress on task-to-task transfer remains limited. In pursuit of efficient and robust generalization, we introduce the Schema Network, an object-oriented generative physics simulator capable of disentangling multiple causes of e…
▽ More
The recent adaptation of deep neural network-based methods to reinforcement learning and planning domains has yielded remarkable progress on individual tasks. Nonetheless, progress on task-to-task transfer remains limited. In pursuit of efficient and robust generalization, we introduce the Schema Network, an object-oriented generative physics simulator capable of disentangling multiple causes of events and reasoning backward through causes to achieve goals. The richly structured architecture of the Schema Network can learn the dynamics of an environment directly from data. We compare Schema Networks with Asynchronous Advantage Actor-Critic and Progressive Networks on a suite of Breakout variations, reporting results on training efficiency and zero-shot generalization, consistently demonstrating faster, more robust learning and better transfer. We argue that generalizing from limited data and learning causal relationships are essential abilities on the path toward generally intelligent systems.
△ Less
Submitted 17 August, 2017; v1 submitted 14 June, 2017;
originally announced June 2017.
-
Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
Authors:
Xinghua Lou,
Ken Kansky,
Wolfgang Lehrach,
CC Laan,
Bhaskara Marthi,
D. Scott Phoenix,
Dileep George
Abstract:
We demonstrate that a generative model for object shapes can achieve state of the art results on challenging scene text recognition tasks, and with orders of magnitude fewer training images than required for competing discriminative methods. In addition to transcribing text from challenging images, our method performs fine-grained instance segmentation of characters. We show that our model is more…
▽ More
We demonstrate that a generative model for object shapes can achieve state of the art results on challenging scene text recognition tasks, and with orders of magnitude fewer training images than required for competing discriminative methods. In addition to transcribing text from challenging images, our method performs fine-grained instance segmentation of characters. We show that our model is more robust to both affine transformations and non-affine deformations compared to previous approaches.
△ Less
Submitted 8 November, 2016;
originally announced November 2016.
-
GRED: Graph-Regularized 3D Shape Reconstruction from Highly Anisotropic and Noisy Images
Authors:
Christian Widmer,
Philipp Drewe,
Xinghua Lou,
Shefali Umrania,
Stephanie Heinrich,
Gunnar Rätsch
Abstract:
Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cell nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cell nuclei is very time consuming, which remains a bottleneck in large scale biological experiments. In this work we present a tool for automated segmenta…
▽ More
Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cell nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cell nuclei is very time consuming, which remains a bottleneck in large scale biological experiments. In this work we present a tool for automated segmentation of cell nuclei from 3D fluorescent microscopic data. Our tool is based on state-of-the-art image processing and machine learning techniques and supports a friendly graphical user interface (GUI). We show that our tool is as accurate as manual annotation but greatly reduces the time for the registration.
△ Less
Submitted 17 September, 2013;
originally announced September 2013.
-
Structured Learning from Partial Annotations
Authors:
Xinghua Lou,
Fred Hamprecht
Abstract:
Structured learning is appropriate when predicting structured outputs such as trees, graphs, or sequences. Most prior work requires the training set to consist of complete trees, graphs or sequences. Specifying such detailed ground truth can be tedious or infeasible for large outputs. Our main contribution is a large margin formulation that makes structured learning from only partially annotated d…
▽ More
Structured learning is appropriate when predicting structured outputs such as trees, graphs, or sequences. Most prior work requires the training set to consist of complete trees, graphs or sequences. Specifying such detailed ground truth can be tedious or infeasible for large outputs. Our main contribution is a large margin formulation that makes structured learning from only partially annotated data possible. The resulting optimization problem is non-convex, yet can be efficiently solve by concave-convex procedure (CCCP) with novel speedup strategies. We apply our method to a challenging tracking-by-assignment problem of a variable number of divisible objects. On this benchmark, using only 25% of a full annotation we achieve a performance comparable to a model learned with a full annotation. Finally, we offer a unifying perspective of previous work using the hinge, ramp, or max loss for structured learning, followed by an empirical comparison on their practical performance.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.