-
Rate-Distortion-Perception Tradeoff for Gaussian Vector Sources
Authors:
**g**g Qian,
Sadaf Salehkalaibar,
Jun Chen,
Ashish Khisti,
Wei Yu,
Wuxian Shi,
Yiqun Ge,
Wen Tong
Abstract:
This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or…
▽ More
This paper studies the rate-distortion-perception (RDP) tradeoff for a Gaussian vector source coding problem where the goal is to compress the multi-component source subject to distortion and perception constraints. The purpose of imposing a perception constraint is to ensure visually pleasing reconstructions. This paper studies this RDP setting with either the Kullback-Leibler (KL) divergence or Wasserstein-2 metric as the perception loss function, and shows that for Gaussian vector sources, jointly Gaussian reconstructions are optimal. We further demonstrate that the optimal tradeoff can be expressed as an optimization problem, which can be explicitly solved. An interesting property of the optimal solution is as follows. Without the perception constraint, the traditional reverse water-filling solution for characterizing the rate-distortion (RD) tradeoff of a Gaussian vector source states that the optimal rate allocated to each component depends on a constant, called the water-level. If the variance of a specific component is below the water-level, it is assigned a {zero} compression rate. However, with active distortion and perception constraints, we show that the optimal rates allocated to the different components are always {positive}. Moreover, the water-levels that determine the optimal rate allocation for different components are unequal. We further treat the special case of perceptually perfect reconstruction and study its RDP function in the high-distortion and low-distortion regimes to obtain insight to the structure of the optimal solution.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Root-KGD: A Novel Framework for Root Cause Diagnosis Based on Knowledge Graph and Industrial Data
Authors:
Jiyu Chen,
**chuan Qian,
Xinmin Zhang,
Zhihuan Song
Abstract:
With the development of intelligent manufacturing and the increasing complexity of industrial production, root cause diagnosis has gradually become an important research direction in the field of industrial fault diagnosis. However, existing research methods struggle to effectively combine domain knowledge and industrial data, failing to provide accurate, online, and reliable root cause diagnosis…
▽ More
With the development of intelligent manufacturing and the increasing complexity of industrial production, root cause diagnosis has gradually become an important research direction in the field of industrial fault diagnosis. However, existing research methods struggle to effectively combine domain knowledge and industrial data, failing to provide accurate, online, and reliable root cause diagnosis results for industrial processes. To address these issues, a novel fault root cause diagnosis framework based on knowledge graph and industrial data, called Root-KGD, is proposed. Root-KGD uses the knowledge graph to represent domain knowledge and employs data-driven modeling to extract fault features from industrial data. It then combines the knowledge graph and data features to perform knowledge graph reasoning for root cause identification. The performance of the proposed method is validated using two industrial process cases, Tennessee Eastman Process (TEP) and Multiphase Flow Facility (MFF). Compared to existing methods, Root-KGD not only gives more accurate root cause variable diagnosis results but also provides interpretable fault-related information by locating faults to corresponding physical entities in knowledge graph (such as devices and streams). In addition, combined with its lightweight nature, Root-KGD is more effective in online industrial applications.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
StructuralSleight: Automated Jailbreak Attacks on Large Language Models Utilizing Uncommon Text-Encoded Structure
Authors:
Bangxin Li,
Hengrui Xing,
Chao Huang,
** Qian,
Huangqing Xiao,
Linfeng Feng,
Cong Tian
Abstract:
Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on…
▽ More
Large Language Models (LLMs) are widely used in natural language processing but face the risk of jailbreak attacks that maliciously induce them to generate harmful content. Existing jailbreak attacks, including character-level and context-level attacks, mainly focus on the prompt of the plain text without specifically exploring the significant influence of its structure. In this paper, we focus on studying how prompt structure contributes to the jailbreak attack. We introduce a novel structure-level attack method based on tail structures that are rarely used during LLM training, which we refer to as Uncommon Text-Encoded Structure (UTES). We extensively study 12 UTESs templates and 6 obfuscation methods to build an effective automated jailbreak tool named StructuralSleight that contains three escalating attack strategies: Structural Attack, Structural and Character/Context Obfuscation Attack, and Fully Obfuscated Structural Attack. Extensive experiments on existing LLMs show that StructuralSleight significantly outperforms baseline methods. In particular, the attack success rate reaches 94.62\% on GPT-4o, which has not been addressed by state-of-the-art techniques.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation
Authors:
Hanzhang Zhou,
Zijian Feng,
Zixiao Zhu,
Junlang Qian,
Kezhi Mao
Abstract:
Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of m…
▽ More
Large language models (LLMs) have demonstrated impressive capabilities in various tasks using the in-context learning (ICL) paradigm. However, their effectiveness is often compromised by inherent bias, leading to prompt brittleness, i.e., sensitivity to design settings such as example selection, order, and prompt formatting. Previous studies have addressed LLM bias through external adjustment of model outputs, but the internal mechanisms that lead to such bias remain unexplored. Our work delves into these mechanisms, particularly investigating how feedforward neural networks (FFNs) and attention heads result in the bias of LLMs. By Interpreting the contribution of individual FFN vectors and attention heads, we identify the biased LLM components that skew LLMs' prediction toward specific labels. To mitigate these biases, we introduce UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads. Extensive experiments across 12 NLP datasets demonstrate that UniBias significantly enhances ICL performance and alleviates prompt brittleness of LLMs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Offline Oracle-Efficient Learning for Contextual MDPs via Layerwise Exploration-Exploitation Tradeoff
Authors:
Jian Qian,
Haichen Hu,
David Simchi-Levi
Abstract:
Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumpt…
▽ More
Motivated by the recent discovery of a statistical and computational reduction from contextual bandits to offline regression (Simchi-Levi and Xu, 2021), we address the general (stochastic) Contextual Markov Decision Process (CMDP) problem with horizon H (as known as CMDP with H layers). In this paper, we introduce a reduction from CMDPs to offline density estimation under the realizability assumption, i.e., a model class M containing the true underlying CMDP is provided in advance. We develop an efficient, statistically near-optimal algorithm requiring only O(HlogT) calls to an offline density estimation algorithm (or oracle) across all T rounds of interaction. This number can be further reduced to O(HloglogT) if T is known in advance. Our results mark the first efficient and near-optimal reduction from CMDPs to offline density estimation without imposing any structural assumptions on the model class. A notable feature of our algorithm is the design of a layerwise exploration-exploitation tradeoff tailored to address the layerwise structure of CMDPs. Additionally, our algorithm is versatile and applicable to pure exploration tasks in reward-free reinforcement learning.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
MambaLLIE: Implicit Retinex-Aware Low Light Enhancement with Global-then-Local State Space
Authors:
Jiangwei Weng,
Zhiqiang Yan,
Ying Tai,
Jianjun Qian,
Jian Yang,
Jun Li
Abstract:
Recent advances in low light image enhancement have been dominated by Retinex-based learning framework, leveraging convolutional neural networks (CNNs) and Transformers. However, the vanilla Retinex theory primarily addresses global illumination degradation and neglects local issues such as noise and blur in dark conditions. Moreover, CNNs and Transformers struggle to capture global degradation du…
▽ More
Recent advances in low light image enhancement have been dominated by Retinex-based learning framework, leveraging convolutional neural networks (CNNs) and Transformers. However, the vanilla Retinex theory primarily addresses global illumination degradation and neglects local issues such as noise and blur in dark conditions. Moreover, CNNs and Transformers struggle to capture global degradation due to their limited receptive fields. While state space models (SSMs) have shown promise in the long-sequence modeling, they face challenges in combining local invariants and global context in visual data. In this paper, we introduce MambaLLIE, an implicit Retinex-aware low light enhancer featuring a global-then-local state space design. We first propose a Local-Enhanced State Space Module (LESSM) that incorporates an augmented local bias within a 2D selective scan mechanism, enhancing the original SSMs by preserving local 2D dependency. Additionally, an Implicit Retinex-aware Selective Kernel module (IRSK) dynamically selects features using spatially-varying operations, adapting to varying inputs through an adaptive kernel selection process. Our Global-then-Local State Space Block (GLSSB) integrates LESSM and IRSK with LayerNorm as its core. This design enables MambaLLIE to achieve comprehensive global long-range modeling and flexible local feature aggregation. Extensive experiments demonstrate that MambaLLIE significantly outperforms state-of-the-art CNN and Transformer-based methods. Project Page: https://mamballie.github.io/anon/
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Authors:
Jianing Qian,
Anastasios Panagopoulos,
Dinesh Jayaraman
Abstract:
Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From T…
▽ More
Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From Transformers, abbreviated as SOFT, a wrapper around pre-trained vision transformer (PVT) models that bridges this gap without any further training. Rather than construct representations out of only the final layer activations, SOFT individuates and locates object-like entities from PVT attentions, and describes them with PVT activations, producing an object-centric embedding. Across standard choices of generic pre-trained vision transformers PVT, we demonstrate in each case that policies trained on SOFT(PVT) far outstrip standard PVT representations for manipulation tasks in simulated and real settings, approaching the state-of-the-art robotics-aware representations. Code, appendix and videos: https://sites.google.com/view/robot-soft/
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection
Authors:
Jun Liu,
Chaoyun Zhang,
Jiaxu Qian,
Minghua Ma,
Si Qin,
Chetan Bansal,
Qingwei Lin,
Saravan Rajmohan,
Dongmei Zhang
Abstract:
Time series anomaly detection (TSAD) plays a crucial role in various industries by identifying atypical patterns that deviate from standard trends, thereby maintaining system integrity and enabling prompt response measures. Traditional TSAD models, which often rely on deep learning, require extensive training data and operate as black boxes, lacking interpretability for detected anomalies. To addr…
▽ More
Time series anomaly detection (TSAD) plays a crucial role in various industries by identifying atypical patterns that deviate from standard trends, thereby maintaining system integrity and enabling prompt response measures. Traditional TSAD models, which often rely on deep learning, require extensive training data and operate as black boxes, lacking interpretability for detected anomalies. To address these challenges, we propose LLMAD, a novel TSAD method that employs Large Language Models (LLMs) to deliver accurate and interpretable TSAD results. LLMAD innovatively applies LLMs for in-context anomaly detection by retrieving both positive and negative similar time series segments, significantly enhancing LLMs' effectiveness. Furthermore, LLMAD employs the Anomaly Detection Chain-of-Thought (AnoCoT) approach to mimic expert logic for its decision-making process. This method further enhances its performance and enables LLMAD to provide explanations for their detections through versatile perspectives, which are particularly important for user decision-making. Experiments on three datasets indicate that our LLMAD achieves detection performance comparable to state-of-the-art deep learning methods while offering remarkable interpretability for detections. To the best of our knowledge, this is the first work that directly employs LLMs for TSAD.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Unveiling and Manipulating Prompt Influence in Large Language Models
Authors:
Zijian Feng,
Hanzhang Zhou,
Zixiao Zhu,
Junlang Qian,
Kezhi Mao
Abstract:
Prompts play a crucial role in guiding the responses of Large Language Models (LLMs). However, the intricate role of individual tokens in prompts, known as input saliency, in sha** the responses remains largely underexplored. Existing saliency methods either misalign with LLM generation objectives or rely heavily on linearity assumptions, leading to potential inaccuracies. To address this, we pr…
▽ More
Prompts play a crucial role in guiding the responses of Large Language Models (LLMs). However, the intricate role of individual tokens in prompts, known as input saliency, in sha** the responses remains largely underexplored. Existing saliency methods either misalign with LLM generation objectives or rely heavily on linearity assumptions, leading to potential inaccuracies. To address this, we propose Token Distribution Dynamics (TDD), a \textcolor{black}{simple yet effective} approach to unveil and manipulate the role of prompts in generating LLM outputs. TDD leverages the robust interpreting capabilities of the language model head (LM head) to assess input saliency. It projects input tokens into the embedding space and then estimates their significance based on distribution dynamics over the vocabulary. We introduce three TDD variants: forward, backward, and bidirectional, each offering unique insights into token relevance. Extensive experiments reveal that the TDD surpasses state-of-the-art baselines with a big margin in elucidating the causal relationships between prompts and LLM outputs. Beyond mere interpretation, we apply TDD to two prompt manipulation tasks for controlled text generation: zero-shot toxic language suppression and sentiment steering. Empirical results underscore TDD's proficiency in identifying both toxic and sentimental cues in prompts, subsequently mitigating toxicity or modulating sentiment in the generated content.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Authors:
Junkai Fan,
Jiangwei Weng,
Kun Wang,
Yijun Yang,
Jianjun Qian,
Jun Li,
Jian Yang
Abstract:
Real driving-video dehazing poses a significant challenge due to the inherent difficulty in acquiring precisely aligned hazy/clear video pairs for effective model training, especially in dynamic driving scenarios with unpredictable weather conditions. In this paper, we propose a pioneering approach that addresses this challenge through a nonaligned regularization strategy. Our core concept involve…
▽ More
Real driving-video dehazing poses a significant challenge due to the inherent difficulty in acquiring precisely aligned hazy/clear video pairs for effective model training, especially in dynamic driving scenarios with unpredictable weather conditions. In this paper, we propose a pioneering approach that addresses this challenge through a nonaligned regularization strategy. Our core concept involves identifying clear frames that closely match hazy frames, serving as references to supervise a video dehazing network. Our approach comprises two key components: reference matching and video dehazing. Firstly, we introduce a non-aligned reference frame matching module, leveraging an adaptive sliding window to match high-quality reference frames from clear videos. Video dehazing incorporates flow-guided cosine attention sampler and deformable cosine attention fusion modules to enhance spatial multiframe alignment and fuse their improved information. To validate our approach, we collect a GoProHazy dataset captured effortlessly with GoPro cameras in diverse rural and urban road environments. Extensive experiments demonstrate the superiority of the proposed method over current state-of-the-art methods in the challenging task of real driving-video dehazing. Project page.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
GI-SMN: Gradient Inversion Attack against Federated Learning without Prior Knowledge
Authors:
** Qian,
Kaimin Wei,
Yongdong Wu,
Jilian Zhang,
Jipeng Chen,
Huan Bao
Abstract:
Federated learning (FL) has emerged as a privacy-preserving machine learning approach where multiple parties share gradient information rather than original user data. Recent work has demonstrated that gradient inversion attacks can exploit the gradients of FL to recreate the original user data, posing significant privacy risks. However, these attacks make strong assumptions about the attacker, su…
▽ More
Federated learning (FL) has emerged as a privacy-preserving machine learning approach where multiple parties share gradient information rather than original user data. Recent work has demonstrated that gradient inversion attacks can exploit the gradients of FL to recreate the original user data, posing significant privacy risks. However, these attacks make strong assumptions about the attacker, such as altering the model structure or parameters, gaining batch normalization statistics, or acquiring prior knowledge of the original training set, etc. Consequently, these attacks are not possible in real-world scenarios. To end it, we propose a novel Gradient Inversion attack based on Style Migration Network (GI-SMN), which breaks through the strong assumptions made by previous gradient inversion attacks. The optimization space is reduced by the refinement of the latent code and the use of regular terms to facilitate gradient matching. GI-SMN enables the reconstruction of user data with high similarity in batches. Experimental results have demonstrated that GI-SMN outperforms state-of-the-art gradient inversion attacks in both visual effect and similarity metrics. Additionally, it also can overcome gradient pruning and differential privacy defenses.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Closing the Perception-Action Loop for Semantically Safe Navigation in Semi-Static Environments
Authors:
**gxing Qian,
Siqi Zhou,
Nicholas Jianrui Ren,
Veronica Chatrath,
Angela P. Schoellig
Abstract:
Autonomous robots navigating in changing environments demand adaptive navigation strategies for safe long-term operation. While many modern control paradigms offer theoretical guarantees, they often assume known extrinsic safety constraints, overlooking challenges when deployed in real-world environments where objects can appear, disappear, and shift over time. In this paper, we present a closed-l…
▽ More
Autonomous robots navigating in changing environments demand adaptive navigation strategies for safe long-term operation. While many modern control paradigms offer theoretical guarantees, they often assume known extrinsic safety constraints, overlooking challenges when deployed in real-world environments where objects can appear, disappear, and shift over time. In this paper, we present a closed-loop perception-action pipeline that bridges this gap. Our system encodes an online-constructed dense map, along with object-level semantic and consistency estimates into a control barrier function (CBF) to regulate safe regions in the scene. A model predictive controller (MPC) leverages the CBF-based safety constraints to adapt its navigation behaviour, which is particularly crucial when potential scene changes occur. We test the system in simulations and real-world experiments to demonstrate the impact of semantic information and scene change handling on robot behavior, validating the practicality of our approach.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning
Authors:
Huan Bao,
Kaimin Wei,
Yongdong Wu,
** Qian,
Robert H. Deng
Abstract:
A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the stru…
▽ More
A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the structures and parameters of the target model, which is not always viable in practice. To overcome the above shortcomings, this paper proposes a novel Distributional Black-Box Model Inversion (DBB-MI) attack by constructing the probabilistic latent space for searching the target privacy data. Specifically, DBB-MI does not need the target model parameters or specialized GAN training. Instead, it finds the latent probability distribution by combining the output of the target model with multi-agent reinforcement learning techniques. Then, it randomly chooses latent codes from the latent probability distribution for recovering the private data. As the latent probability distribution closely aligns with the target privacy data in latent space, the recovered data will leak the privacy of training samples of the target model significantly. Abundant experiments conducted on diverse datasets and networks show that the present DBB-MI has better performance than state-of-the-art in attack accuracy, K-nearest neighbor feature distance, and Peak Signal-to-Noise Ratio.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models
Authors:
Junyao Shi,
Jianing Qian,
Yecheng Jason Ma,
Dinesh Jayaraman
Abstract:
There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology an…
▽ More
There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Online Estimation via Offline Estimation: An Information-Theoretic Framework
Authors:
Dylan J. Foster,
Yanjun Han,
Jian Qian,
Alexander Rakhlin
Abstract:
$…
▽ More
$ $The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design ("offline estimation"), while the contemporary theory of online learning provides algorithms for estimation under adaptively chosen covariates ("online estimation"). Motivated by connections between estimation and interactive decision making, we ask: is it possible to convert offline estimation algorithms into online estimation algorithms in a black-box fashion? We investigate this question from an information-theoretic perspective by introducing a new framework, Oracle-Efficient Online Estimation (OEOE), where the learner can only interact with the data stream indirectly through a sequence of offline estimators produced by a black-box algorithm operating on the stream. Our main results settle the statistical and computational complexity of online estimation in this framework.
$\bullet$ Statistical complexity. We show that information-theoretically, there exist algorithms that achieve near-optimal online estimation error via black-box offline estimation oracles, and give a nearly-tight characterization for minimax rates in the OEOE framework.
$\bullet$ Computational complexity. We show that the guarantees above cannot be achieved in a computationally efficient fashion in general, but give a refined characterization for the special case of conditional density estimation: computationally efficient online estimation via black-box offline estimation is possible whenever it is possible via unrestricted algorithms.
Finally, we apply our results to give offline oracle-efficient algorithms for interactive decision making.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points
Authors:
Tian Ma,
Chuyang Shang,
Wanzhu Ren,
Yuancheng Li,
Jiiayi Yang,
Jiali Qian
Abstract:
In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output…
▽ More
In recent years, research on point weakly supervised object detection (PWSOD) methods in the field of computer vision has attracted people's attention. However, existing pseudo labels generation methods perform poorly in a small amount of supervised annotation data and dense object detection tasks. We consider the generation of weakly supervised pseudo labels as the result of model's sparse output, and propose a method called Sparse Generation to make pseudo labels sparse. It constructs dense tensors through the relationship between data and detector model, optimizes three of its parameters, and obtains a sparse tensor via coordinated calculation, thereby indirectly obtaining higher quality pseudo labels, and solving the model's density problem in the situation of only a small amount of supervised annotation data can be used. On two broadly used open-source datasets (RSOD, SIMD) and a self-built dataset (Bullet-Hole), the experimental results showed that the proposed method has a significant advantage in terms of overall performance metrics, comparing to that state-of-the-art method.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Couler: Unified Machine Learning Workflow Optimization in Cloud
Authors:
Xiaoda Wang,
Yuan Tang,
Tengda Guo,
Bo Sang,
**gji Wu,
Jian Sha,
Ke Zhang,
Jiang Qian,
Mingjie Tang
Abstract:
Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, num…
▽ More
Machine Learning (ML) has become ubiquitous, fueling data-driven applications across various organizations. Contrary to the traditional perception of ML in research, ML workflows can be complex, resource-intensive, and time-consuming. Expanding an ML workflow to encompass a wider range of data infrastructure and data types may lead to larger workloads and increased deployment costs. Currently, numerous workflow engines are available (with over ten being widely recognized). This variety poses a challenge for end-users in terms of mastering different engine APIs. While efforts have primarily focused on optimizing ML Operations (MLOps) for a specific workflow engine, current methods largely overlook workflow optimization across different engines.
In this work, we design and implement Couler, a system designed for unified ML workflow optimization in the cloud. Our main insight lies in the ability to generate an ML workflow using natural language (NL) descriptions. We integrate Large Language Models (LLMs) into workflow generation, and provide a unified programming interface for various workflow engines. This approach alleviates the need to understand various workflow engines' APIs. Moreover, Couler enhances workflow computation efficiency by introducing automated caching at multiple stages, enabling large workflow auto-parallelization and automatic hyperparameters tuning. These enhancements minimize redundant computational costs and improve fault tolerance during deep learning workflow training. Couler is extensively deployed in real-world production scenarios at Ant Group, handling approximately 22k workflows daily, and has successfully improved the CPU/Memory utilization by more than 15% and the workflow completion rate by around 17%.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection
Authors:
Chenchen Tao,
Chong Wang,
Yuexian Zou,
Xiaohao Peng,
Jiafei Wu,
Jiangbo Qian
Abstract:
Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly. The ambiguous nature of anomaly definitions across contexts introduces bias in detecting abnormal and normal snippets within the abnormal bag. Taking the first step to show the model why it is anomalous, a…
▽ More
Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly. The ambiguous nature of anomaly definitions across contexts introduces bias in detecting abnormal and normal snippets within the abnormal bag. Taking the first step to show the model why it is anomalous, a novel framework is proposed to guide the learning of suspected anomalies from event prompts. Given a textual prompt dictionary of potential anomaly events and the captions generated from anomaly videos, the semantic anomaly similarity between them could be calculated to identify the suspected anomalous events for each video snippet. It enables a new multi-prompt learning process to constrain the visual-semantic features across all videos, as well as provides a new way to label pseudo anomalies for self-training. To demonstrate effectiveness, comprehensive experiments and detailed ablation studies are conducted on four datasets, namely XD-Violence, UCF-Crime, TAD, and ShanghaiTech. Our proposed model outperforms most state-of-the-art methods in terms of AP or AUC (82.6\%, 87.7\%, 93.1\%, and 97.4\%). Furthermore, it shows promising performance in open-set and cross-dataset cases.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Structured Deep Neural Networks-Based Backstep** Trajectory Tracking Control for Lagrangian Systems
Authors:
Jiajun Qian,
Liang Xu,
Xiaoqiang Ren,
Xiaofan Wang
Abstract:
Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By p…
▽ More
Deep neural networks (DNN) are increasingly being used to learn controllers due to their excellent approximation capabilities. However, their black-box nature poses significant challenges to closed-loop stability guarantees and performance analysis. In this paper, we introduce a structured DNN-based controller for the trajectory tracking control of Lagrangian systems using backing techniques. By properly designing neural network structures, the proposed controller can ensure closed-loop stability for any compatible neural network parameters. In addition, improved control performance can be achieved by further optimizing neural network parameters. Besides, we provide explicit upper bounds on tracking errors in terms of controller parameters, which allows us to achieve the desired tracking performance by properly selecting the controller parameters. Furthermore, when system models are unknown, we propose an improved Lagrangian neural network (LNN) structure to learn the system dynamics and design the controller. We show that in the presence of model approximation errors and external disturbances, the closed-loop stability and tracking control performance can still be guaranteed. The effectiveness of the proposed approach is demonstrated through simulations.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality
Authors:
Guande Wu,
**g Qian,
Sonia Castelo,
Shaoyu Chen,
Joao Rulff,
Claudio Silva
Abstract:
Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text lengt…
▽ More
Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text length and semantic content for augmented reality. Developed out of a formative study that included seven users and three experts, our system combines a customized error calibration model with a few-shot prompt to integrate the syntactic, lexical, elaborative, and content simplification techniques, and generate simplified AR text for head-worn displays. Results from a 16-user empirical study showed that ARTiST lightens the cognitive load and improves performance significantly over both unmodified text and text modified via traditional methods. Our work constitutes a step towards automating the optimization of batch text data for readability and performance in augmented reality.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
A Summary of Privacy-Preserving Data Publishing in the Local Setting
Authors:
Wenjun Lin,
Jiahao Qian,
Wenwen Liu,
Lang Wu
Abstract:
The exponential growth of collected, processed, and shared data has given rise to concerns about individuals' privacy. Consequently, various laws and regulations have been established to oversee how organizations handle and safeguard data. One such method is Statistical Disclosure Control, which aims to minimize the risk of exposing confidential information by de-identifying it. This de-identifica…
▽ More
The exponential growth of collected, processed, and shared data has given rise to concerns about individuals' privacy. Consequently, various laws and regulations have been established to oversee how organizations handle and safeguard data. One such method is Statistical Disclosure Control, which aims to minimize the risk of exposing confidential information by de-identifying it. This de-identification is achieved through specific privacy-preserving techniques. However, a trade-off exists: de-identified data can often lead to a loss of information, which might impact the accuracy of data analysis and the predictive capability of models. The overarching goal remains to safeguard individual privacy while preserving the data's interpretability, meaning its overall usefulness. Despite advances in Statistical Disclosure Control, the field continues to evolve, with no definitive solution that strikes an optimal balance between privacy and utility. This survey delves into the intricate processes of de-identification. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance. Herein, we tackle the primary challenges posed by privacy constraints, overview predominant strategies to mitigate these challenges, categorize privacy-preserving techniques, offer a theoretical assessment of current comparative research, and highlight numerous unresolved issues in the domain.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Heuristic-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction
Authors:
Hanzhang Zhou,
Junlang Qian,
Zijian Feng,
Hui Lu,
Zixiao Zhu,
Kezhi Mao
Abstract:
In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE) to alleviate the dependency on large-scale labeled data for this task. We introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting to address the challenge of example selection and to develop a prompting strategy tailored for EAE. Specifically, we hypothesize and validate that LLMs le…
▽ More
In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE) to alleviate the dependency on large-scale labeled data for this task. We introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting to address the challenge of example selection and to develop a prompting strategy tailored for EAE. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their performance on unseen classes beyond limited ICL examples. Experiments show that our method outperforms existing prompting methods and few-shot supervised learning methods on document-level EAE datasets. Additionally, the HD-LoA prompting shows effectiveness in diverse tasks like sentiment analysis and natural language inference, demonstrating its broad adaptability.
△ Less
Submitted 19 February, 2024; v1 submitted 11 November, 2023;
originally announced November 2023.
-
ABIGX: A Unified Framework for eXplainable Fault Detection and Classification
Authors:
Yue Zhuo,
**chuan Qian,
Zhihuan Song,
Zhiqiang Ge
Abstract:
For explainable fault detection and classification (FDC), this paper proposes a unified framework, ABIGX (Adversarial fault reconstruction-Based Integrated Gradient eXplanation). ABIGX is derived from the essentials of previous successful fault diagnosis methods, contribution plots (CP) and reconstruction-based contribution (RBC). It is the first explanation framework that provides variable contri…
▽ More
For explainable fault detection and classification (FDC), this paper proposes a unified framework, ABIGX (Adversarial fault reconstruction-Based Integrated Gradient eXplanation). ABIGX is derived from the essentials of previous successful fault diagnosis methods, contribution plots (CP) and reconstruction-based contribution (RBC). It is the first explanation framework that provides variable contributions for the general FDC models. The core part of ABIGX is the adversarial fault reconstruction (AFR) method, which rethinks the FR from the perspective of adversarial attack and generalizes to fault classification models with a new fault index. For fault classification, we put forward a new problem of fault class smearing, which intrinsically hinders the correct explanation. We prove that ABIGX effectively mitigates this problem and outperforms the existing gradient-based explanation methods. For fault detection, we theoretically bridge ABIGX with conventional fault diagnosis methods by proving that CP and RBC are the linear specifications of ABIGX. The experiments evaluate the explanations of FDC by quantitative metrics and intuitive illustrations, the results of which show the general superiority of ABIGX to other advanced explanation methods.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Bayesian imaging inverse problem with SA-Roundtrip prior via HMC-pCN sampler
Authors:
Jiayu Qian,
Yuanyuan Liu,
**gya Yang,
Qing** Zhou
Abstract:
Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling gene…
▽ More
Bayesian inference with deep generative prior has received considerable interest for solving imaging inverse problems in many scientific and engineering fields. The selection of the prior distribution is learned from, and therefore an important representation learning of, available prior measurements. The SA-Roundtrip, a novel deep generative prior, is introduced to enable controlled sampling generation and identify the data's intrinsic dimension. This prior incorporates a self-attention structure within a bidirectional generative adversarial network. Subsequently, Bayesian inference is applied to the posterior distribution in the low-dimensional latent space using the Hamiltonian Monte Carlo with preconditioned Crank-Nicolson (HMC-pCN) algorithm, which is proven to be ergodic under specific conditions. Experiments conducted on computed tomography (CT) reconstruction with the MNIST and TomoPhantom datasets reveal that the proposed method outperforms state-of-the-art comparisons, consistently yielding a robust and superior point estimator along with precise uncertainty quantification.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Uncertainty-aware 3D Object-Level Map** with Deep Shape Priors
Authors:
Ziwei Liao,
Jun Yang,
**gxing Qian,
Angela P. Schoellig,
Steven L. Waslander
Abstract:
3D object-level map** is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected o…
▽ More
3D object-level map** is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level map** system. Moreover, the resulting shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can also be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving achieves substantial improvements over state-of-the-art methods. Our code will be available at https://github.com/TRAILab/UncertainShapePose.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Human-in-the-loop online just-in-time software defect prediction
Authors:
Xutong Liu,
Yufei Zhou,
Yutian Tang,
Junyan Qian,
Yuming Zhou
Abstract:
Online Just-In-Time Software Defect Prediction (O-JIT-SDP) uses an online model to predict whether a new software change will introduce a bug or not. However, existing studies neglect the interaction of Software Quality Assurance (SQA) staff with the model, which may miss the opportunity to improve the prediction accuracy through the feedback from SQA staff. To tackle this problem, we propose Huma…
▽ More
Online Just-In-Time Software Defect Prediction (O-JIT-SDP) uses an online model to predict whether a new software change will introduce a bug or not. However, existing studies neglect the interaction of Software Quality Assurance (SQA) staff with the model, which may miss the opportunity to improve the prediction accuracy through the feedback from SQA staff. To tackle this problem, we propose Human-In-The-Loop (HITL) O-JIT-SDP that integrates feedback from SQA staff to enhance the prediction process. Furthermore, we introduce a performance evaluation framework that utilizes a k-fold distributed bootstrap method along with the Wilcoxon signed-rank test. This framework facilitates thorough pairwise comparisons of alternative classification algorithms using a prequential evaluation approach. Our proposal enables continuous statistical testing throughout the prequential process, empowering developers to make real-time decisions based on robust statistical evidence. Through experimentation across 10 GitHub projects, we demonstrate that our evaluation framework enhances the credibility of model evaluation, and the incorporation of HITL feedback elevates the prediction performance of online JIT-SDP models. These advancements hold the potential to significantly enhance the value of O-JIT-SDP for industrial applications.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Full-dose Whole-body PET Synthesis from Low-dose PET Using High-efficiency Denoising Diffusion Probabilistic Model: PET Consistency Model
Authors:
Shaoyan Pan,
Elham Abouei,
Junbo Peng,
Joshua Qian,
Jacob F Wynne,
Tonghe Wang,
Chih-Wei Chang,
Justin Roper,
Jonathon A Nye,
Hui Mao,
Xiaofeng Yang
Abstract:
Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk t…
▽ More
Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk to patients. Approach: We introduce PET Consistency Model (PET-CM), an efficient diffusion-based method for generating high-quality full-dose PET images from low-dose PET images. It employs a two-step process, adding Gaussian noise to full-dose PET images in the forward diffusion, and then denoising them using a PET Shifted-window Vision Transformer (PET-VIT) network in the reverse diffusion. The PET-VIT network learns a consistency function that enables direct denoising of Gaussian noise into clean full-dose PET images. PET-CM achieves state-of-the-art image quality while requiring significantly less computation time than other methods. Results: In experiments comparing eighth-dose to full-dose images, PET-CM demonstrated impressive performance with NMAE of 1.278+/-0.122%, PSNR of 33.783+/-0.824dB, SSIM of 0.964+/-0.009, NCC of 0.968+/-0.011, HRS of 4.543, and SUV Error of 0.255+/-0.318%, with an average generation time of 62 seconds per patient. This is a significant improvement compared to the state-of-the-art diffusion-based model with PET-CM reaching this result 12x faster. Similarly, in the quarter-dose to full-dose image experiments, PET-CM delivered competitive outcomes, achieving an NMAE of 0.973+/-0.066%, PSNR of 36.172+/-0.801dB, SSIM of 0.984+/-0.004, NCC of 0.990+/-0.005, HRS of 4.428, and SUV Error of 0.151+/-0.192% using the same generation process, which underlining its high quantitative and clinical precision in both denoising scenario.
△ Less
Submitted 16 April, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks
Authors:
Junkai Qian,
Yuning Jiang,
Xin Liu,
Qing Wang,
Ting Wang,
Yuanming Shi,
Wei Chen
Abstract:
With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV charging control strategies have been developed to manage the switch between vehicle-to-grid (V2G) and grid-to-vehicle (G2V) modes for EVs. In this context, multi-agent deep reinforcement learning (MADRL) has proven its effectiveness in EV charging…
▽ More
With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV charging control strategies have been developed to manage the switch between vehicle-to-grid (V2G) and grid-to-vehicle (G2V) modes for EVs. In this context, multi-agent deep reinforcement learning (MADRL) has proven its effectiveness in EV charging control. However, existing MADRL-based approaches fail to consider the natural power flow of EV charging/discharging in the distribution network and ignore driver privacy. To deal with these problems, this paper proposes a novel approach that combines multi-EV charging/discharging with a radial distribution network (RDN) operating under optimal power flow (OPF) to distribute power flow in real time. A mathematical model is developed to describe the RDN load. The EV charging control problem is formulated as a Markov Decision Process (MDP) to find an optimal charging control strategy that balances V2G profits, RDN load, and driver anxiety. To effectively learn the optimal EV charging control strategy, a federated deep reinforcement learning algorithm named FedSAC is further proposed. Comprehensive simulation results demonstrate the effectiveness and superiority of our proposed algorithm in terms of the diversity of the charging control strategy, the power fluctuations on RDN, the convergence efficiency, and the generalization ability.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Decentralized Graph Neural Network for Privacy-Preserving Recommendation
Authors:
Xiaolin Zheng,
Zhongyu Wang,
Chaochao Chen,
Jiashu Qian,
Yao Yang
Abstract:
Building a graph neural network (GNN)-based recommender system without violating user privacy proves challenging. Existing methods can be divided into federated GNNs and decentralized GNNs. But both methods have undesirable effects, i.e., low communication efficiency and privacy leakage. This paper proposes DGREC, a novel decentralized GNN for privacy-preserving recommendations, where users can ch…
▽ More
Building a graph neural network (GNN)-based recommender system without violating user privacy proves challenging. Existing methods can be divided into federated GNNs and decentralized GNNs. But both methods have undesirable effects, i.e., low communication efficiency and privacy leakage. This paper proposes DGREC, a novel decentralized GNN for privacy-preserving recommendations, where users can choose to publicize their interactions. It includes three stages, i.e., graph construction, local gradient calculation, and global gradient passing. The first stage builds a local inner-item hypergraph for each user and a global inter-user graph. The second stage models user preference and calculates gradients on each local device. The third stage designs a local differential privacy mechanism named secure gradient-sharing, which proves strong privacy-preserving of users' private data. We conduct extensive experiments on three public datasets to validate the consistent superiority of our framework.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Freshness or Accuracy, Why Not Both? Addressing Delayed Feedback via Dynamic Graph Neural Networks
Authors:
Xiaolin Zheng,
Zhongyu Wang,
Chaochao Chen,
Feng Zhu,
Jiashu Qian
Abstract:
The delayed feedback problem is one of the most pressing challenges in predicting the conversion rate since users' conversions are always delayed in online commercial systems. Although new data are beneficial for continuous training, without complete feedback information, i.e., conversion labels, training algorithms may suffer from overwhelming fake negatives. Existing methods tend to use multitas…
▽ More
The delayed feedback problem is one of the most pressing challenges in predicting the conversion rate since users' conversions are always delayed in online commercial systems. Although new data are beneficial for continuous training, without complete feedback information, i.e., conversion labels, training algorithms may suffer from overwhelming fake negatives. Existing methods tend to use multitask learning or design data pipelines to solve the delayed feedback problem. However, these methods have a trade-off between data freshness and label accuracy. In this paper, we propose Delayed Feedback Modeling by Dynamic Graph Neural Network (DGDFEM). It includes three stages, i.e., preparing a data pipeline, building a dynamic graph, and training a CVR prediction model. In the model training, we propose a novel graph convolutional method named HLGCN, which leverages both high-pass and low-pass filters to deal with conversion and non-conversion relationships. The proposed method achieves both data freshness and label accuracy. We conduct extensive experiments on three industry datasets, which validate the consistent superiority of our method.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Large Language Models to Identify Social Determinants of Health in Electronic Health Records
Authors:
Marco Guevara,
Shan Chen,
Spencer Thomas,
Tafadzwa L. Chaunzwa,
Idalid Franco,
Benjamin Kann,
Shalini Moningi,
Jack Qian,
Madeleine Goldstein,
Susan Harper,
Hugo JWL Aerts,
Guergana K. Savova,
Raymond H. Mak,
Danielle S. Bitterman
Abstract:
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documente…
▽ More
Social determinants of health (SDoH) have an important impact on patient outcomes but are incompletely collected from the electronic health records (EHR). This study researched the ability of large language models to extract SDoH from free text in EHRs, where they are most commonly documented, and explored the role of synthetic clinical text for improving the extraction of these scarcely documented, yet extremely valuable, clinical data. 800 patient notes were annotated for SDoH categories, and several transformer-based models were evaluated. The study also experimented with synthetic data generation and assessed for algorithmic bias. Our best-performing models were fine-tuned Flan-T5 XL (macro-F1 0.71) for any SDoH, and Flan-T5 XXL (macro-F1 0.70). The benefit of augmenting fine-tuning with synthetic data varied across model architecture and size, with smaller Flan-T5 models (base and large) showing the greatest improvements in performance (delta F1 +0.12 to +0.23). Model performance was similar on the in-hospital system dataset but worse on the MIMIC-III dataset. Our best-performing fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models for both tasks. These fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p<0.05). At the patient-level, our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. Our method can effectively extracted SDoH information from clinic notes, performing better compare to GPT zero- and few-shot settings. These models could enhance real-world evidence on SDoH and aid in identifying patients needing social support.
△ Less
Submitted 5 March, 2024; v1 submitted 11 August, 2023;
originally announced August 2023.
-
ARGUS: Visualization of AI-Assisted Task Guidance in AR
Authors:
Sonia Castelo,
Joao Rulff,
Erin McGowan,
Bea Steers,
Guande Wu,
Shaoyu Chen,
Iran Roman,
Roque Lopez,
Ethan Brewer,
Chen Zhao,
**g Qian,
Kyunghyun Cho,
He He,
Qi Sun,
Huy Vo,
Juan Bello,
Michael Krone,
Claudio Silva
Abstract:
The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of senso…
▽ More
The concept of augmented reality (AR) assistants has captured the human imagination for decades, becoming a staple of modern science fiction. To pursue this goal, it is necessary to develop artificial intelligence (AI)-based methods that simultaneously perceive the 3D environment, reason about physical tasks, and model the performer, all in real-time. Within this framework, a wide variety of sensors are needed to generate data across different modalities, such as audio, video, depth, speech, and time-of-flight. The required sensors are typically part of the AR headset, providing performer sensing and interaction through visual, audio, and haptic feedback. AI assistants not only record the performer as they perform activities, but also require machine learning (ML) models to understand and assist the performer as they interact with the physical world. Therefore, develo** such assistants is a challenging task. We propose ARGUS, a visual analytics system to support the development of intelligent AR assistants. Our system was designed as part of a multi year-long collaboration between visualization researchers and ML and AR experts. This co-design process has led to advances in the visualization of ML in AR. Our system allows for online visualization of object, action, and step detection as well as offline analysis of previously recorded AR sessions. It visualizes not only the multimodal sensor data streams but also the output of the ML models. This allows developers to gain insights into the performer activities as well as the ML models, hel** them troubleshoot, improve, and fine tune the components of the AR assistant.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems
Authors:
Qing** Zhou,
Jiayu Qian,
Junqi Tang,
**glai Li
Abstract:
Combining the strengths of model-based iterative algorithms and data-driven deep learning solutions, deep unrolling networks (DuNets) have become a popular tool to solve inverse imaging problems. While DuNets have been successfully applied to many linear inverse problems, nonlinear problems tend to impair the performance of the method. Inspired by momentum acceleration techniques that are often us…
▽ More
Combining the strengths of model-based iterative algorithms and data-driven deep learning solutions, deep unrolling networks (DuNets) have become a popular tool to solve inverse imaging problems. While DuNets have been successfully applied to many linear inverse problems, nonlinear problems tend to impair the performance of the method. Inspired by momentum acceleration techniques that are often used in optimization algorithms, we propose a recurrent momentum acceleration (RMA) framework that uses a long short-term memory recurrent neural network (LSTM-RNN) to simulate the momentum acceleration process. The RMA module leverages the ability of the LSTM-RNN to learn and retain knowledge from the previous gradients. We apply RMA to two popular DuNets -- the learned proximal gradient descent (LPGD) and the learned primal-dual (LPD) methods, resulting in LPGD-RMA and LPD-RMA respectively. We provide experimental results on two nonlinear inverse problems: a nonlinear deconvolution problem, and an electrical impedance tomography problem with limited boundary measurements. In the first experiment we have observed that the improvement due to RMA largely increases with respect to the nonlinearity of the problem. The results of the second example further demonstrate that the RMA schemes can significantly improve the performance of DuNets in strongly ill-posed problems.
△ Less
Submitted 31 March, 2024; v1 submitted 29 July, 2023;
originally announced July 2023.
-
POV-SLAM: Probabilistic Object-Aware Variational SLAM in Semi-Static Environments
Authors:
**gxing Qian,
Veronica Chatrath,
James Servos,
Aaron Mavrinac,
Wolfram Burgard,
Steven L. Waslander,
Angela P. Schoellig
Abstract:
Simultaneous localization and map** (SLAM) in slowly varying scenes is important for long-term robot task completion. Failing to detect scene changes may lead to inaccurate maps and, ultimately, lost robots. Classical SLAM algorithms assume static scenes, and recent works take dynamics into account, but require scene changes to be observed in consecutive frames. Semi-static scenes, wherein objec…
▽ More
Simultaneous localization and map** (SLAM) in slowly varying scenes is important for long-term robot task completion. Failing to detect scene changes may lead to inaccurate maps and, ultimately, lost robots. Classical SLAM algorithms assume static scenes, and recent works take dynamics into account, but require scene changes to be observed in consecutive frames. Semi-static scenes, wherein objects appear, disappear, or move slowly over time, are often overlooked, yet are critical for long-term operation. We propose an object-aware, factor-graph SLAM framework that tracks and reconstructs semi-static object-level changes. Our novel variational expectation-maximization strategy is used to optimize factor graphs involving a Gaussian-Uniform bimodal measurement likelihood for potentially-changing objects. We evaluate our approach alongside the state-of-the-art SLAM solutions in simulation and on our novel real-world SLAM dataset captured in a warehouse over four months. Our method improves the robustness of localization in the presence of semi-static changes, providing object-level reasoning about the scene.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction
Authors:
Mingyuan Luo,
Xin Yang,
Zhongnuo Yan,
Junyu Li,
Yuanji Zhang,
Jiongquan Chen,
Xindi Hu,
Jikuan Qian,
Jun Cheng,
Dong Ni
Abstract:
Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone.…
▽ More
Ultrasound (US) imaging is a popular tool in clinical diagnosis, offering safety, repeatability, and real-time capabilities. Freehand 3D US is a technique that provides a deeper understanding of scanned regions without increasing complexity. However, estimating elevation displacement and accumulation error remains challenging, making it difficult to infer the relative position using images alone. The addition of external lightweight sensors has been proposed to enhance reconstruction performance without adding complexity, which has been shown to be beneficial. We propose a novel online self-consistency network (OSCNet) using multiple inertial measurement units (IMUs) to improve reconstruction performance. OSCNet utilizes a modal-level self-supervised strategy to fuse multiple IMU information and reduce differences between reconstruction results obtained from each IMU data. Additionally, a sequence-level self-consistency strategy is proposed to improve the hierarchical consistency of prediction results among the scanning sequence and its sub-sequences. Experiments on large-scale arm and carotid datasets with multiple scanning tactics demonstrate that our OSCNet outperforms previous methods, achieving state-of-the-art reconstruction performance.
△ Less
Submitted 18 July, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Imitation with Spatial-Temporal Heatmap: 2nd Place Solution for NuPlan Challenge
Authors:
Yihan Hu,
Kun Li,
**yuan Liang,
**gyu Qian,
Zhening Yang,
Haichao Zhang,
Wenxin Shao,
Zhuangzhuang Ding,
Wei Xu,
Qiang Liu
Abstract:
This paper presents our 2nd place solution for the NuPlan Challenge 2023. Autonomous driving in real-world scenarios is highly complex and uncertain. Achieving safe planning in the complex multimodal scenarios is a highly challenging task. Our approach, Imitation with Spatial-Temporal Heatmap, adopts the learning form of behavior cloning, innovatively predicts the future multimodal states with a h…
▽ More
This paper presents our 2nd place solution for the NuPlan Challenge 2023. Autonomous driving in real-world scenarios is highly complex and uncertain. Achieving safe planning in the complex multimodal scenarios is a highly challenging task. Our approach, Imitation with Spatial-Temporal Heatmap, adopts the learning form of behavior cloning, innovatively predicts the future multimodal states with a heatmap representation, and uses trajectory refinement techniques to ensure final safety. The experiment shows that our method effectively balances the vehicle's progress and safety, generating safe and comfortable trajectories. In the NuPlan competition, we achieved the second highest overall score, while obtained the best scores in the ego progress and comfort metrics.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
PathMLP: Smooth Path Towards High-order Homophily
Authors:
Chenxuan Xie,
Jiajun Zhou,
Shengbo Gong,
Jiacheng Wan,
Jiaxu Qian,
Shanqing Yu,
Qi Xuan,
Xiaoniu Yang
Abstract:
Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, we observe that certain high-order information on heterophilous data exhibits high homophily, which motivates us to involve high-order information in no…
▽ More
Real-world graphs exhibit increasing heterophily, where nodes no longer tend to be connected to nodes with the same label, challenging the homophily assumption of classical graph neural networks (GNNs) and impeding their performance. Intriguingly, we observe that certain high-order information on heterophilous data exhibits high homophily, which motivates us to involve high-order information in node representation learning. However, common practices in GNNs to acquire high-order information mainly through increasing model depth and altering message-passing mechanisms, which, albeit effective to a certain extent, suffer from three shortcomings: 1) over-smoothing due to excessive model depth and propagation times; 2) high-order information is not fully utilized; 3) low computational efficiency. In this regard, we design a similarity-based path sampling strategy to capture smooth paths containing high-order homophily. Then we propose a lightweight model based on multi-layer perceptrons (MLP), named PathMLP, which can encode messages carried by paths via simple transformation and concatenation operations, and effectively learn node representations in heterophilous graphs through adaptive path aggregation. Extensive experiments demonstrate that our method outperforms baselines on 16 out of 20 datasets, underlining its effectiveness and superiority in alleviating the heterophily problem. In addition, our method is immune to over-smoothing and has high computational efficiency.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Minimum Eigenvalue Based Covariance Matrix Estimation with Limited Samples
Authors:
**g Qian,
Juening **,
Hao Wang
Abstract:
In this paper, we consider the interference rejection combining (IRC) receiver, which improves the cell-edge user throughput via suppressing inter-cell interference and requires estimating the covariance matrix including the inter-cell interference with high accuracy. In order to solve the problem of sample covariance matrix estimation with limited samples, a regularization parameter optimization…
▽ More
In this paper, we consider the interference rejection combining (IRC) receiver, which improves the cell-edge user throughput via suppressing inter-cell interference and requires estimating the covariance matrix including the inter-cell interference with high accuracy. In order to solve the problem of sample covariance matrix estimation with limited samples, a regularization parameter optimization based on the minimum eigenvalue criterion is developed. It is different from traditional methods that aim at minimizing the mean squared error, but goes straight at the objective of optimizing the final performance of the IRC receiver. A lower bound of the minimum eigenvalue that is easier to calculate is also derived. Simulation results demonstrate that the proposed approach is effective and can approach the performance of the oracle estimator in terms of the mutual information metric.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Convex and Non-convex Optimization Under Generalized Smoothness
Authors:
Haochuan Li,
Jian Qian,
Yi Tian,
Alexander Rakhlin,
Ali Jadbabaie
Abstract:
Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clip**, ass…
▽ More
Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clip**, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clip** and allows heavy-tailed noise with bounded variance in the stochastic setting.
△ Less
Submitted 3 November, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
HiQ -- A Declarative, Non-intrusive, Dynamic and Transparent Observability and Optimization System
Authors:
Fuheng Wu,
Ivan Davchev,
Jun Qian
Abstract:
This paper proposes a non-intrusive, declarative, dynamic and transparent system called `HiQ` to track Python program runtime information without compromising on the run-time system performance and losing insight. HiQ can be used for monolithic and distributed systems, offline and online applications. HiQ is developed when we optimize our large deep neural network (DNN) models which are written in…
▽ More
This paper proposes a non-intrusive, declarative, dynamic and transparent system called `HiQ` to track Python program runtime information without compromising on the run-time system performance and losing insight. HiQ can be used for monolithic and distributed systems, offline and online applications. HiQ is developed when we optimize our large deep neural network (DNN) models which are written in Python, but it can be generalized to any Python program or distributed system, or even other languages like Java. We have implemented the system and adopted it in our deep learning model life cycle management system to catch the bottleneck while kee** our production code clean and highly performant. The implementation is open-sourced at: [https://github.com/oracle/hiq](https://github.com/oracle/hiq).
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning
Authors:
Junqi Qian,
Paul Weng,
Chenmien Tan
Abstract:
When applying reinforcement learning (RL) to a new problem, reward engineering is a necessary, but often difficult and error-prone task a system designer has to face. To avoid this step, we propose LR4GPM, a novel (deep) RL method that can optimize a global performance metric, which is supposed to be available as part of the problem description. LR4GPM alternates between two phases: (1) learning a…
▽ More
When applying reinforcement learning (RL) to a new problem, reward engineering is a necessary, but often difficult and error-prone task a system designer has to face. To avoid this step, we propose LR4GPM, a novel (deep) RL method that can optimize a global performance metric, which is supposed to be available as part of the problem description. LR4GPM alternates between two phases: (1) learning a (possibly vector) reward function used to fit the performance metric, and (2) training a policy to optimize an approximation of this performance metric based on the learned rewards. Such RL training is not straightforward since both the reward function and the policy are trained using non-stationary data. To overcome this issue, we propose several training tricks. We demonstrate the efficiency of LR4GPM on several domains. Notably, LR4GPM outperforms the winner of a recent autonomous driving competition organized at DAI'2020.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Non-aligned supervision for Real Image Dehazing
Authors:
Junkai Fan,
Fei Guo,
Jianjun Qian,
Xiang Li,
Jun Li,
Jian Yang
Abstract:
Removing haze from real-world images is challenging due to unpredictable weather conditions, resulting in the misalignment of hazy and clear image pairs. In this paper, we propose an innovative dehazing framework that operates under non-aligned supervision. This framework is grounded in the atmospheric scattering model, and consists of three interconnected networks: dehazing, airlight, and transmi…
▽ More
Removing haze from real-world images is challenging due to unpredictable weather conditions, resulting in the misalignment of hazy and clear image pairs. In this paper, we propose an innovative dehazing framework that operates under non-aligned supervision. This framework is grounded in the atmospheric scattering model, and consists of three interconnected networks: dehazing, airlight, and transmission networks. In particular, we explore a non-alignment scenario that a clear reference image, unaligned with the input hazy image, is utilized to supervise the dehazing network. To implement this, we present a multi-scale reference loss that compares the feature representations between the referred image and the dehazed output. Our scenario makes it easier to collect hazy/clear image pairs in real-world environments, even under conditions of misalignment and shift views. To showcase the effectiveness of our scenario, we have collected a new hazy dataset including 415 image pairs captured by mobile Phone in both rural and urban areas, called "Phone-Hazy". Furthermore, we introduce a self-attention network based on mean and variance for modeling real infinite airlight, using the dark channel prior as positional guidance. Additionally, a channel attention network is employed to estimate the three-channel transmission. Experimental results demonstrate the superior performance of our framework over existing state-of-the-art techniques in the real-world image dehazing task. Phone-Hazy and code will be available at https://fanjunkai1.github.io/projectpage/NSDNet/index.html.
△ Less
Submitted 5 January, 2024; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Recurrent Structure Attention Guidance for Depth Super-Resolution
Authors:
Jiayi Yuan,
Haobo Jiang,
Xiang Li,
Jianjun Qian,
Jun Li,
Jian Yang
Abstract:
Image guidance is an effective strategy for depth super-resolution. Generally, most existing methods employ hand-crafted operators to decompose the high-frequency (HF) and low-frequency (LF) ingredients from low-resolution depth maps and guide the HF ingredients by directly concatenating them with image features. However, the hand-designed operators usually cause inferior HF maps (e.g., distorted…
▽ More
Image guidance is an effective strategy for depth super-resolution. Generally, most existing methods employ hand-crafted operators to decompose the high-frequency (HF) and low-frequency (LF) ingredients from low-resolution depth maps and guide the HF ingredients by directly concatenating them with image features. However, the hand-designed operators usually cause inferior HF maps (e.g., distorted or structurally missing) due to the diverse appearance of complex depth maps. Moreover, the direct concatenation often results in weak guidance because not all image features have a positive effect on the HF maps. In this paper, we develop a recurrent structure attention guided (RSAG) framework, consisting of two important parts. First, we introduce a deep contrastive network with multi-scale filters for adaptive frequency-domain separation, which adopts contrastive networks from large filters to small ones to calculate the pixel contrasts for adaptive high-quality HF predictions. Second, instead of the coarse concatenation guidance, we propose a recurrent structure attention block, which iteratively utilizes the latest depth estimation and the image features to jointly select clear patterns and boundaries, aiming at providing refined guidance for accurate depth recovery. In addition, we fuse the features of HF maps to enhance the edge structures in the decomposed LF maps. Extensive experiments show that our approach obtains superior performance compared with state-of-the-art depth super-resolution methods.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Structure Flow-Guided Network for Real Depth Super-Resolution
Authors:
Jiayi Yuan,
Haobo Jiang,
Xiang Li,
Jianjun Qian,
Jun Li,
Jian Yang
Abstract:
Real depth super-resolution (DSR), unlike synthetic settings, is a challenging task due to the structural distortion and the edge noise caused by the natural degradation in real-world low-resolution (LR) depth maps. These defeats result in significant structure inconsistency between the depth map and the RGB guidance, which potentially confuses the RGB-structure guidance and thereby degrades the D…
▽ More
Real depth super-resolution (DSR), unlike synthetic settings, is a challenging task due to the structural distortion and the edge noise caused by the natural degradation in real-world low-resolution (LR) depth maps. These defeats result in significant structure inconsistency between the depth map and the RGB guidance, which potentially confuses the RGB-structure guidance and thereby degrades the DSR quality. In this paper, we propose a novel structure flow-guided DSR framework, where a cross-modality flow map is learned to guide the RGB-structure information transferring for precise depth upsampling. Specifically, our framework consists of a cross-modality flow-guided upsampling network (CFUNet) and a flow-enhanced pyramid edge attention network (PEANet). CFUNet contains a trilateral self-attention module combining both the geometric and semantic correlations for reliable cross-modality flow learning. Then, the learned flow maps are combined with the grid-sampling mechanism for coarse high-resolution (HR) depth prediction. PEANet targets at integrating the learned flow map as the edge attention into a pyramid network to hierarchically learn the edge-focused guidance feature for depth edge refinement. Extensive experiments on real and synthetic DSR datasets verify that our approach achieves excellent performance compared to state-of-the-art methods.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Language Model Detoxification in Dialogue with Contextualized Stance Control
Authors:
**g Qian,
Xifeng Yan
Abstract:
To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling ta…
▽ More
To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling tasks in previous work, where the desired attributes are fixed for generation, the desired stance of the generation depends on the offensiveness of the context. Therefore, we propose a novel control method to do context-dependent detoxification with the stance taken into consideration. We introduce meta prefixes to learn the contextualized stance control strategy and to generate the stance control prefix according to the input context. The generated stance prefix is then combined with the toxicity control prefix to guide the response generation. Experimental results show that our proposed method can effectively learn the context-dependent stance control strategies while kee** a low self-toxicity of the underlying LM.
△ Less
Submitted 24 January, 2023;
originally announced January 2023.
-
Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Authors:
Dylan J. Foster,
Noah Golowich,
Jian Qian,
Alexander Rakhlin,
Ayush Sekhari
Abstract:
We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which ach…
▽ More
We consider the problem of interactive decision making, encompassing structured bandits and reinforcement learning with general function approximation. Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient, a measure of statistical complexity that lower bounds the optimal regret for interactive decision making, as well as a meta-algorithm, Estimation-to-Decisions, which achieves upper bounds in terms of the same quantity. Estimation-to-Decisions is a reduction, which lifts algorithms for (supervised) online estimation into algorithms for decision making. In this paper, we show that by combining Estimation-to-Decisions with a specialized form of optimistic estimation introduced by Zhang (2022), it is possible to obtain guarantees that improve upon those of Foster et al. (2021) by accommodating more lenient notions of estimation error. We use this approach to derive regret bounds for model-free reinforcement learning with value function approximation, and give structural results showing when it can and cannot help more generally.
△ Less
Submitted 12 August, 2023; v1 submitted 25 November, 2022;
originally announced November 2022.
-
Completing point cloud from few points by Wasserstein GAN and Transformers
Authors:
Xianfeng Wu,
**hui Qian,
Qing Wei,
Xianzu Wu,
Xinyi Liu,
Luxin Hu,
Yanli Gong,
Zhongyuan Lai,
Libing Wu
Abstract:
In many vision and robotics applications, it is common that the captured objects are represented by very few points. Most of the existing completion methods are designed for partial point clouds with many points, and they perform poorly or even fail completely in the case of few points. However, due to the lack of detail information, completing objects from few points faces a huge challenge. Inspi…
▽ More
In many vision and robotics applications, it is common that the captured objects are represented by very few points. Most of the existing completion methods are designed for partial point clouds with many points, and they perform poorly or even fail completely in the case of few points. However, due to the lack of detail information, completing objects from few points faces a huge challenge. Inspired by the successful applications of GAN and Transformers in the image-based vision task, we introduce GAN and Transformer techniques to address the above problem. Firstly, the end-to-end encoder-decoder network with Transformers and the Wasserstein GAN with Transformer are pre-trained, and then the overall network is fine-tuned. Experimental results on the ShapeNet dataset show that our method can not only improve the completion performance for many input points, but also keep stable for few input points. Our source code is available at https://github.com/WxfQjh/Stability-point-recovery.git.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Visual Commonsense-aware Representation Network for Video Captioning
Authors:
Pengpeng Zeng,
Haonan Zhang,
Lianli Gao,
Xiangpeng Li,
** Qian,
Heng Tao Shen
Abstract:
Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video its…
▽ More
Generating consecutive descriptions for videos, i.e., Video Captioning, requires taking full advantage of visual representation along with the generation process. Existing video captioning methods focus on making an exploration of spatial-temporal representations and their relationships to produce inferences. However, such methods only exploit the superficial association contained in the video itself without considering the intrinsic visual commonsense knowledge that existed in a video dataset, which may hinder their capabilities of knowledge cognitive to reason accurate descriptions. To address this problem, we propose a simple yet effective method, called Visual Commonsense-aware Representation Network (VCRN), for video captioning. Specifically, we construct a Video Dictionary, a plug-and-play component, obtained by clustering all video features from the total dataset into multiple clustered centers without additional annotation. Each center implicitly represents a visual commonsense concept in the video domain, which is utilized in our proposed Visual Concept Selection (VCS) to obtain a video-related concept feature. Next, a Conceptual Integration Generation (CIG) is proposed to enhance the caption generation. Extensive experiments on three publicly video captioning benchmarks: MSVD, MSR-VTT, and VATEX, demonstrate that our method reaches state-of-the-art performance, indicating the effectiveness of our method. In addition, our approach is integrated into the existing method of video question answering and improves this performance, further showing the generalization of our method. Source code has been released at https://github.com/zchoi/VCRN.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Explanations from Large Language Models Make Small Reasoners Better
Authors:
Shiyang Li,
Jianshu Chen,
Yelong Shen,
Zhiyu Chen,
Xinlu Zhang,
Zekun Li,
Hong Wang,
**g Qian,
Baolin Peng,
Yi Mao,
Wenhu Chen,
Xifeng Yan
Abstract:
Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically…
▽ More
Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically explore three explanation generation approaches from LLM and utilize a multi-task learning framework to facilitate small models to acquire strong reasoning power together with explanation generation capabilities. Experiments on multiple reasoning tasks show that our method can consistently and significantly outperform finetuning baselines across different settings, and even perform better than finetuning/prompting a 60x larger GPT-3 (175B) model by up to 9.5% in accuracy. As a side benefit, human evaluation further shows that our method can generate high-quality explanations to justify its predictions, moving towards the goal of explainable AI.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
A Tagging Solution to Discover IoT Devices in Apartments
Authors:
Berkay Kaplan,
**gyu Qian,
Israel J Lopez-Toledo,
Carl A. Gunter
Abstract:
The number of IoT devices in smart homes is increasing. This broad adoption facilitates users' lives, but it also brings problems. One such issue is that some IoT devices may invade users' privacy. Some reasons for this invasion can stem from obscure data collection practices or hidden devices. Specific IoT devices can exist out of sight and still collect user data to send to third parties via the…
▽ More
The number of IoT devices in smart homes is increasing. This broad adoption facilitates users' lives, but it also brings problems. One such issue is that some IoT devices may invade users' privacy. Some reasons for this invasion can stem from obscure data collection practices or hidden devices. Specific IoT devices can exist out of sight and still collect user data to send to third parties via the Internet. Owners can easily forget the location or even the existence of these devices, especially if the owner is a landlord who manages several properties. The landlord-owner scenario creates multi-user problems as designers build machines for single users. We developed tags that use wireless protocols, buzzers, and LED lighting to lead users to solve the issue of device discovery in shared spaces and accommodate multi-user scenarios. They are attached to IoT devices inside a unit during their installation to be later discovered by a tenant. These tags have similar functionalities as the popular Tile models or Airtag, but our tags have different features based on our privacy use case. Our tags do not require pairing; multiple users can interact with them through our Android application. Although researchers developed several other tools, such as thermal cameras or virtual reality (VR), for discovering devices in environments, they have not used wireless protocols as a solution. We measured specific performance metrics of our tags to analyze their feasibility for this problem. We also conducted a user study to measure the participants' comfort levels while finding objects with our tags attached. Our results indicate that wireless tags can be viable for device tracking in residential properties.
△ Less
Submitted 20 September, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.