-
General Binding Affinity Guidance for Diffusion Models in Structure-Based Drug Design
Authors:
Yue Jian,
Curtis Wu,
Danny Reidenbach,
Aditi S. Krishnapriyan
Abstract:
Structure-Based Drug Design (SBDD) focuses on generating valid ligands that strongly and specifically bind to a designated protein pocket. Several methods use machine learning for SBDD to generate these ligands in 3D space, conditioned on the structure of a desired protein pocket. Recently, diffusion models have shown success here by modeling the underlying distributions of atomic positions and ty…
▽ More
Structure-Based Drug Design (SBDD) focuses on generating valid ligands that strongly and specifically bind to a designated protein pocket. Several methods use machine learning for SBDD to generate these ligands in 3D space, conditioned on the structure of a desired protein pocket. Recently, diffusion models have shown success here by modeling the underlying distributions of atomic positions and types. While these methods are effective in considering the structural details of the protein pocket, they often fail to explicitly consider the binding affinity. Binding affinity characterizes how tightly the ligand binds to the protein pocket, and is measured by the change in free energy associated with the binding process. It is one of the most crucial metrics for benchmarking the effectiveness of the interaction between a ligand and protein pocket. To address this, we propose BADGER: Binding Affinity Diffusion Guidance with Enhanced Refinement. BADGER is a general guidance method to steer the diffusion sampling process towards improved protein-ligand binding, allowing us to adjust the distribution of the binding affinity between ligands and proteins. Our method is enabled by using a neural network (NN) to model the energy function, which is commonly approximated by AutoDock Vina (ADV). ADV's energy function is non-differentiable, and estimates the affinity based on the interactions between a ligand and target protein receptor. By using a NN as a differentiable energy function proxy, we utilize the gradient of our learned energy function as a guidance method on top of any trained diffusion model. We show that our method improves the binding affinity of generated ligands to their protein receptors by up to 60\%, significantly surpassing previous machine learning methods. We also show that our guidance method is flexible and can be easily applied to other diffusion-based SBDD frameworks.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching
Authors:
Simranjit Singh,
Michael Fore,
Andreas Karatzas,
Chaehong Lee,
Yanan Jian,
Longfei Shangguan,
Fuxun Yu,
Iraklis Anagnostopoulos,
Dimitrios Stamoulis
Abstract:
As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to…
▽ More
As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Stable Diffusion For Aerial Object Detection
Authors:
Yanan Jian,
Fuxun Yu,
Simranjit Singh,
Dimitrios Stamoulis
Abstract:
Aerial object detection is a challenging task, in which one major obstacle lies in the limitations of large-scale data collection and the long-tail distribution of certain classes. Synthetic data offers a promising solution, especially with recent advances in diffusion-based methods like stable diffusion (SD). However, the direct application of diffusion methods to aerial domains poses unique chal…
▽ More
Aerial object detection is a challenging task, in which one major obstacle lies in the limitations of large-scale data collection and the long-tail distribution of certain classes. Synthetic data offers a promising solution, especially with recent advances in diffusion-based methods like stable diffusion (SD). However, the direct application of diffusion methods to aerial domains poses unique challenges: stable diffusion's optimization for rich ground-level semantics doesn't align with the sparse nature of aerial objects, and the extraction of post-synthesis object coordinates remains problematic. To address these challenges, we introduce a synthetic data augmentation framework tailored for aerial images. It encompasses sparse-to-dense region of interest (ROI) extraction to bridge the semantic gap, fine-tuning the diffusion model with low-rank adaptation (LORA) to circumvent exhaustive retraining, and finally, a Copy-Paste method to compose synthesized objects with backgrounds, providing a nuanced approach to aerial object detection through synthetic data.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Authors:
Yiren Jian,
Tingkai Liu,
Yunzhe Tao,
Chunhui Zhang,
Soroush Vosoughi,
Hongxia Yang
Abstract:
In this paper, we introduce $\text{EVL}_{\text{Gen}}$, a streamlined framework designed for the pre-training of visually conditioned language generation models with high computational demands, utilizing frozen pre-trained large language models (LLMs). The conventional approach in vision-language pre-training (VLP) typically involves a two-stage optimization process: an initial resource-intensive p…
▽ More
In this paper, we introduce $\text{EVL}_{\text{Gen}}$, a streamlined framework designed for the pre-training of visually conditioned language generation models with high computational demands, utilizing frozen pre-trained large language models (LLMs). The conventional approach in vision-language pre-training (VLP) typically involves a two-stage optimization process: an initial resource-intensive phase dedicated to general-purpose vision-language representation learning, focused on extracting and consolidating relevant visual features. This is followed by a subsequent phase that emphasizes end-to-end alignment between visual and linguistic modalities. Our novel one-stage, single-loss framework bypasses the computationally demanding first training stage by gradually merging similar visual tokens during training, while avoiding model collapse caused by single-stage training of BLIP-2 type models. The gradual merging process effectively condenses visual information while preserving semantic richness, resulting in rapid convergence without compromising performance. Our experimental findings demonstrate that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance. Furthermore, we illustrate that our models significantly narrow the performance gap to current vision-language models using only 1/10 of the data. Finally, we showcase how our image-text models can seamlessly adapt to video-conditioned language generation tasks through novel soft attentive temporal token contextualizing modules. Code is available at \url{https://github.com/yiren-jian/EVLGen}.
△ Less
Submitted 21 February, 2024; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Bootstrap** Vision-Language Learning with Decoupled Language Pre-training
Authors:
Yiren Jian,
Chongyang Gao,
Soroush Vosoughi
Abstract:
We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, speci…
▽ More
We present a novel methodology aimed at optimizing the application of frozen large language models (LLMs) for resource-intensive vision-language (VL) pre-training. The current paradigm uses visual features as prompts to guide language models, with a focus on determining the most relevant visual features for corresponding text. Our approach diverges by concentrating on the language component, specifically identifying the optimal prompts to align with visual features. We introduce the Prompt-Transformer (P-Former), a model that predicts these ideal prompts, which is trained exclusively on linguistic data, bypassing the need for image-text pairings. This strategy subtly bifurcates the end-to-end VL training process into an additional, separate stage. Our experiments reveal that our framework significantly enhances the performance of a robust image-to-text baseline (BLIP-2), and effectively narrows the performance gap between models trained with either 4M or 129M image-text pairs. Importantly, our framework is modality-agnostic and flexible in terms of architectural design, as validated by its successful application in a video learning task using varied base modules. The code will be made available at https://github.com/yiren-jian/BLIText.
△ Less
Submitted 19 December, 2023; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Real-Time Network-Level Traffic Signal Control: An Explicit Multiagent Coordination Method
Authors:
Wanyuan Wang,
Tianchi Qiao,
**ming Ma,
Jiahui **,
Zhibin Li,
Weiwei Wu,
Yichuan Jian
Abstract:
Efficient traffic signal control (TSC) has been one of the most useful ways for reducing urban road congestion. Key to the challenge of TSC includes 1) the essential of real-time signal decision, 2) the complexity in traffic dynamics, and 3) the network-level coordination. Recent efforts that applied reinforcement learning (RL) methods can query policies by map** the traffic state to the signal…
▽ More
Efficient traffic signal control (TSC) has been one of the most useful ways for reducing urban road congestion. Key to the challenge of TSC includes 1) the essential of real-time signal decision, 2) the complexity in traffic dynamics, and 3) the network-level coordination. Recent efforts that applied reinforcement learning (RL) methods can query policies by map** the traffic state to the signal decision in real-time, however, is inadequate for unexpected traffic flows. By observing real traffic information, online planning methods can compute the signal decisions in a responsive manner. We propose an explicit multiagent coordination (EMC)-based online planning methods that can satisfy adaptive, real-time and network-level TSC. By multiagent, we model each intersection as an autonomous agent, and the coordination efficiency is modeled by a cost (i.e., congestion index) function between neighbor intersections. By network-level coordination, each agent exchanges messages with respect to cost function with its neighbors in a fully decentralized manner. By real-time, the message passing procedure can interrupt at any time when the real time limit is reached and agents select the optimal signal decisions according to the current message. Moreover, we prove our EMC method can guarantee network stability by borrowing ideas from transportation domain. Finally, we test our EMC method in both synthetic and real road network datasets. Experimental results are encouraging: compared to RL and conventional transportation baselines, our EMC method performs reasonably well in terms of adapting to real-time traffic dynamics, minimizing vehicle travel time and scalability to city-scale road networks.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Knowledge from Large-Scale Protein Contact Prediction Models Can Be Transferred to the Data-Scarce RNA Contact Prediction Task
Authors:
Yiren Jian,
Chongyang Gao,
Chen Zeng,
Yunjie Zhao,
Soroush Vosoughi
Abstract:
RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled…
▽ More
RNA, whose functionality is largely determined by its structure, plays an important role in many biological activities. The prediction of pairwise structural proximity between each nucleotide of an RNA sequence can characterize the structural information of the RNA. Historically, this problem has been tackled by machine learning models using expert-engineered features and trained on scarce labeled datasets. Here, we find that the knowledge learned by a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task. As protein datasets are orders of magnitude larger than those for RNA contact prediction, our findings and the subsequent framework greatly reduce the data scarcity bottleneck. Experiments confirm that RNA contact prediction through transfer learning using a publicly available protein model is greatly improved. Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.
△ Less
Submitted 18 January, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
Predicting CO$_2$ Absorption in Ionic Liquids with Molecular Descriptors and Explainable Graph Neural Networks
Authors:
Yue Jian,
Yuyang Wang,
Amir Barati Farimani
Abstract:
Ionic Liquids (ILs) provide a promising solution for CO$_2$ capture and storage to mitigate global warming. However, identifying and designing the high-capacity IL from the giant chemical space requires expensive, and exhaustive simulations and experiments. Machine learning (ML) can accelerate the process of searching for desirable ionic molecules through accurate and efficient property prediction…
▽ More
Ionic Liquids (ILs) provide a promising solution for CO$_2$ capture and storage to mitigate global warming. However, identifying and designing the high-capacity IL from the giant chemical space requires expensive, and exhaustive simulations and experiments. Machine learning (ML) can accelerate the process of searching for desirable ionic molecules through accurate and efficient property predictions in a data-driven manner. But existing descriptors and ML models for the ionic molecule suffer from the inefficient adaptation of molecular graph structure. Besides, few works have investigated the explainability of ML models to help understand the learned features that can guide the design of efficient ionic molecules. In this work, we develop both fingerprint-based ML models and Graph Neural Networks (GNNs) to predict the CO$_2$ absorption in ILs. Fingerprint works on graph structure at the feature extraction stage, while GNNs directly handle molecule structure in both the feature extraction and model prediction stage. We show that our method outperforms previous ML models by reaching a high accuracy (MAE of 0.0137, $R^2$ of 0.9884). Furthermore, we take the advantage of GNNs feature representation and develop a substructure-based explanation method that provides insight into how each chemical fragments within IL molecules contribute to the CO$_2$ absorption prediction of ML models. We also show that our explanation result agrees with some ground truth from the theoretical reaction mechanism of CO$_2$ absorption in ILs, which can advise on the design of novel and efficient functional ILs in the future.
△ Less
Submitted 9 November, 2022; v1 submitted 29 September, 2022;
originally announced October 2022.
-
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Authors:
Yiren Jian,
Chongyang Gao,
Soroush Vosoughi
Abstract:
Semantic representation learning for sentences is an important and well-studied problem in NLP. The current trend for this task involves training a Transformer-based sentence encoder through a contrastive objective with text, i.e., clustering sentences with semantically similar meanings and scattering others. In this work, we find the performance of Transformer models as sentence encoders can be i…
▽ More
Semantic representation learning for sentences is an important and well-studied problem in NLP. The current trend for this task involves training a Transformer-based sentence encoder through a contrastive objective with text, i.e., clustering sentences with semantically similar meanings and scattering others. In this work, we find the performance of Transformer models as sentence encoders can be improved by training with multi-modal multi-task losses, using unpaired examples from another modality (e.g., sentences and unrelated image/audio data). In particular, besides learning by the contrastive loss on text, our model clusters examples from a non-linguistic domain (e.g., visual/audio) with a similar contrastive loss at the same time. The reliance of our framework on unpaired non-linguistic data makes it language-agnostic, enabling it to be widely applicable beyond English NLP. Experiments on 7 semantic textual similarity benchmarks reveal that models trained with the additional non-linguistic (images/audio) contrastive objective lead to higher quality sentence embeddings. This indicates that Transformer models are able to generalize better by doing a similar task (i.e., clustering) with unpaired examples from different modalities in a multi-task fashion.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Contrastive Learning for Prompt-Based Few-Shot Language Learners
Authors:
Yiren Jian,
Chongyang Gao,
Soroush Vosoughi
Abstract:
The impressive performance of GPT-3 using natural language prompts and in-context learning has inspired work on better fine-tuning of moderately-sized models under this paradigm. Following this line of work, we present a contrastive learning framework that clusters inputs from the same class for better generality of models trained with only limited examples. Specifically, we propose a supervised c…
▽ More
The impressive performance of GPT-3 using natural language prompts and in-context learning has inspired work on better fine-tuning of moderately-sized models under this paradigm. Following this line of work, we present a contrastive learning framework that clusters inputs from the same class for better generality of models trained with only limited examples. Specifically, we propose a supervised contrastive framework that clusters inputs from the same class under different augmented "views" and repel the ones from different classes. We create different "views" of an example by appending it with different language prompts and contextual demonstrations. Combining a contrastive loss with the standard masked language modeling (MLM) loss in prompt-based few-shot learners, the experimental results show that our method can improve over the state-of-the-art methods in a diverse set of 15 language tasks. Our framework makes minimal assumptions on the task or the base model, and can be applied to many recent methods with little modification. The code will be made available at: https://github.com/yiren-jian/LM-SupCon.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Embedding Hallucination for Few-Shot Language Fine-tuning
Authors:
Yiren Jian,
Chongyang Gao,
Soroush Vosoughi
Abstract:
Few-shot language learners adapt knowledge from a pre-trained model to recognize novel classes from a few-labeled sentences. In such settings, fine-tuning a pre-trained language model can cause severe over-fitting. In this paper, we propose an Embedding Hallucination (EmbedHalluc) method, which generates auxiliary embedding-label pairs to expand the fine-tuning dataset. The hallucinator is trained…
▽ More
Few-shot language learners adapt knowledge from a pre-trained model to recognize novel classes from a few-labeled sentences. In such settings, fine-tuning a pre-trained language model can cause severe over-fitting. In this paper, we propose an Embedding Hallucination (EmbedHalluc) method, which generates auxiliary embedding-label pairs to expand the fine-tuning dataset. The hallucinator is trained by playing an adversarial game with the discriminator, such that the hallucinated embedding is indiscriminative to the real ones in the fine-tuning dataset. By training with the extended dataset, the language learner effectively learns from the diverse hallucinated embeddings to overcome the over-fitting issue. Experiments demonstrate that our proposed method is effective in a wide range of language tasks, outperforming current fine-tuning methods. Further, we show that EmbedHalluc outperforms other methods that address this over-fitting problem, such as common data augmentation, semi-supervised pseudo-labeling, and regularization. The code will be made available at: https://github.com/yiren-jian/EmbedHalluc.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
Label Hallucination for Few-Shot Classification
Authors:
Yiren Jian,
Lorenzo Torresani
Abstract:
Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes, each represented by few labeled examples. In such a scenario, pretraining a network with high capacity on the large dataset and then finetuning it on the few examples causes severe overfitting. At the same time, training a simple linear classifier on top of "frozen" fe…
▽ More
Few-shot classification requires adapting knowledge learned from a large annotated base dataset to recognize novel unseen classes, each represented by few labeled examples. In such a scenario, pretraining a network with high capacity on the large dataset and then finetuning it on the few examples causes severe overfitting. At the same time, training a simple linear classifier on top of "frozen" features learned from the large labeled dataset fails to adapt the model to the properties of the novel classes, effectively inducing underfitting. In this paper we propose an alternative approach to both of these two popular strategies. First, our method pseudo-labels the entire large dataset using the linear classifier trained on the novel classes. This effectively "hallucinates" the novel classes in the large dataset, despite the novel categories not being present in the base database (novel and base classes are disjoint). Then, it finetunes the entire model with a distillation loss on the pseudo-labeled base examples, in addition to the standard cross-entropy loss on the novel dataset. This step effectively trains the network to recognize contextual and appearance cues that are useful for the novel-category recognition but using the entire large-scale base dataset and thus overcoming the inherent data-scarcity problem of few-shot learning. Despite the simplicity of the approach, we show that that our method outperforms the state-of-the-art on four well-established few-shot classification benchmarks.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
MetaPix: Domain Transfer for Semantic Segmentation by Meta Pixel Weighting
Authors:
Yiren Jian,
Chongyang Gao
Abstract:
Training a deep neural model for semantic segmentation requires collecting a large amount of pixel-level labeled data. To alleviate the data scarcity problem presented in the real world, one could utilize synthetic data whose label is easy to obtain. Previous work has shown that the performance of a semantic segmentation model can be improved by training jointly with real and synthetic examples wi…
▽ More
Training a deep neural model for semantic segmentation requires collecting a large amount of pixel-level labeled data. To alleviate the data scarcity problem presented in the real world, one could utilize synthetic data whose label is easy to obtain. Previous work has shown that the performance of a semantic segmentation model can be improved by training jointly with real and synthetic examples with a proper weighting on the synthetic data. Such weighting was learned by a heuristic to maximize the similarity between synthetic and real examples. In our work, we instead learn a pixel-level weighting of the synthetic data by meta-learning, i.e., the learning of weighting should only be minimizing the loss on the target task. We achieve this by gradient-on-gradient technique to propagate the target loss back into the parameters of the weighting model. The experiments show that our method with only one single meta module can outperform a complicated combination of an adversarial feature alignment, a reconstruction loss, plus a hierarchical heuristic weighting at pixel, region and image levels.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments
Authors:
Jiafei Duan,
Samson Yu Bai Jian,
Cheston Tan
Abstract:
Recent advancements in deep learning, computer vision, and embodied AI have given rise to synthetic causal reasoning video datasets. These datasets facilitate the development of AI algorithms that can reason about physical interactions between objects. However, datasets thus far have primarily focused on elementary physical events such as rolling or falling. There is currently a scarcity of datase…
▽ More
Recent advancements in deep learning, computer vision, and embodied AI have given rise to synthetic causal reasoning video datasets. These datasets facilitate the development of AI algorithms that can reason about physical interactions between objects. However, datasets thus far have primarily focused on elementary physical events such as rolling or falling. There is currently a scarcity of datasets that focus on the physical interactions that humans perform daily with objects in the real world. To address this scarcity, we introduce SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments. The SPACE simulator allows us to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact. These events make up the vast majority of the basic physical interactions between objects. We then further evaluate it with a state-of-the-art physics-based deep model and show that the SPACE dataset improves the learning of intuitive physics with an approach inspired by curriculum learning. Repository: https://github.com/jiafei1224/SPACE
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Aspect Sentiment Triplet Extraction Using Reinforcement Learning
Authors:
Samson Yu Bai Jian,
Tapas Nayak,
Navonil Majumder,
Soujanya Poria
Abstract:
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we…
▽ More
Aspect Sentiment Triplet Extraction (ASTE) is the task of extracting triplets of aspect terms, their associated sentiments, and the opinion terms that provide evidence for the expressed sentiments. Previous approaches to ASTE usually simultaneously extract all three components or first identify the aspect and opinion terms, then pair them up to predict their sentiment polarities. In this work, we present a novel paradigm, ASTE-RL, by regarding the aspect and opinion terms as arguments of the expressed sentiment in a hierarchical reinforcement learning (RL) framework. We first focus on sentiments expressed in a sentence, then identify the target aspect and opinion terms for that sentiment. This takes into account the mutual interactions among the triplet's components while improving exploration and sample efficiency. Furthermore, this hierarchical RLsetup enables us to deal with multiple and overlap** triplets. In our experiments, we evaluate our model on existing datasets from laptop and restaurant domains and show that it achieves state-of-the-art performance. The implementation of this work is publicly available at https://github.com/declare-lab/ASTE-RL.
△ Less
Submitted 13 August, 2021;
originally announced August 2021.
-
Recognizing Emotion Cause in Conversations
Authors:
Soujanya Poria,
Navonil Majumder,
Devamanyu Hazarika,
Deepanway Ghosal,
Rishabh Bhardwaj,
Samson Yu Bai Jian,
Pengfei Hong,
Romila Ghosh,
Abhinaba Roy,
Niyati Chhaya,
Alexander Gelbukh,
Rada Mihalcea
Abstract:
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NL…
▽ More
We address the problem of recognizing emotion cause in conversations, define two novel sub-tasks of this problem, and provide a corresponding dialogue-level dataset, along with strong Transformer-based baselines. The dataset is available at https://github.com/declare-lab/RECCON.
Introduction: Recognizing the cause behind emotions in text is a fundamental yet under-explored area of research in NLP. Advances in this area hold the potential to improve interpretability and performance in affect-based models. Identifying emotion causes at the utterance level in conversations is particularly challenging due to the intermingling dynamics among the interlocutors.
Method: We introduce the task of Recognizing Emotion Cause in CONversations with an accompanying dataset named RECCON, containing over 1,000 dialogues and 10,000 utterance cause-effect pairs. Furthermore, we define different cause types based on the source of the causes, and establish strong Transformer-based baselines to address two different sub-tasks on this dataset: causal span extraction and causal emotion entailment.
Result: Our Transformer-based baselines, which leverage contextual pre-trained embeddings, such as RoBERTa, outperform the state-of-the-art emotion cause extraction approaches
Conclusion: We introduce a new task highly relevant for (explainable) emotion-aware artificial intelligence: recognizing emotion cause in conversations, provide a new highly challenging publicly available dialogue-level dataset for this task, and give strong baseline results on this dataset.
△ Less
Submitted 28 July, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Unsupervised Feature Selection via Multi-step Markov Transition Probability
Authors:
Yan Min,
Mao Ye,
Liang Tian,
Yulin Jian,
Ce Zhu,
Shangming Yang
Abstract:
Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propos…
▽ More
Feature selection is a widely used dimension reduction technique to select feature subsets because of its interpretability. Many methods have been proposed and achieved good results, in which the relationships between adjacent data points are mainly concerned. But the possible associations between data pairs that are may not adjacent are always neglected. Different from previous methods, we propose a novel and very simple approach for unsupervised feature selection, named MMFS (Multi-step Markov transition probability for Feature Selection). The idea is using multi-step Markov transition probability to describe the relation between any data pair. Two ways from the positive and negative viewpoints are employed respectively to keep the data structure after feature selection. From the positive viewpoint, the maximum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. Then, the features which can keep the compact data structure are selected. From the viewpoint of negative, the minimum transition probability that can be reached in a certain number of steps is used to describe the relation between two points. On the contrary, the features that least maintain the loose data structure are selected. And the two ways can also be combined. Thus three algorithms are proposed. Our main contributions are a novel feature section approach which uses multi-step transition probability to characterize the data structure, and three algorithms proposed from the positive and negative aspects for kee** data structure. The performance of our approach is compared with the state-of-the-art methods on eight real-world data sets, and the experimental results show that the proposed MMFS is effective in unsupervised feature selection.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Understanding the Disharmony between Weight Normalization Family and Weight Decay: $ε-$shifted $L_2$ Regularizer
Authors:
Li Xiang,
Chen Shuo,
Xia Yan,
Yang Jian
Abstract:
The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight $\boldsymbol{W}$ to $\boldsymbol{W}'$, which makes $\boldsymbol{W}'$ independent to the magnitude of $\boldsymbol{W}$. Surprisingly, $\boldsymbol{W}$ must be decayed during gr…
▽ More
The merits of fast convergence and potentially better performance of the weight normalization family have drawn increasing attention in recent years. These methods use standardization or normalization that changes the weight $\boldsymbol{W}$ to $\boldsymbol{W}'$, which makes $\boldsymbol{W}'$ independent to the magnitude of $\boldsymbol{W}$. Surprisingly, $\boldsymbol{W}$ must be decayed during gradient descent, otherwise we will observe a severe under-fitting problem, which is very counter-intuitive since weight decay is widely known to prevent deep networks from over-fitting. In this paper, we \emph{theoretically} prove that the weight decay term $\frac{1}{2}λ||{\boldsymbol{W}}||^2$ merely modulates the effective learning rate for improving objective optimization, and has no influence on generalization when the weight normalization family is compositely employed. Furthermore, we also expose several critical problems when introducing weight decay term to weight normalization family, including the missing of global minimum and training instability. To address these problems, we propose an $ε-$shifted $L_2$ regularizer, which shifts the $L_2$ objective by a positive constant $ε$. Such a simple operation can theoretically guarantee the existence of global minimum, while preventing the network weights from being too small and thus avoiding gradient float overflow. It significantly improves the training stability and can achieve slightly better performance in our practice. The effectiveness of $ε-$shifted $L_2$ regularizer is comprehensively validated on the ImageNet, CIFAR-100, and COCO datasets. Our codes and pretrained models will be released in https://github.com/implus/PytorchInsight.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
A Simple Proof of Maxwell Saturation for Coupled Scalar Recursions
Authors:
Arvind Yedla,
Yung-Yih Jian,
Phong S. Nguyen,
Henry D. Pfister
Abstract:
Low-density parity-check (LDPC) convolutional codes (or spatially-coupled codes) were recently shown to approach capacity on the binary erasure channel (BEC) and binary-input memoryless symmetric channels. The mechanism behind this spectacular performance is now called threshold saturation via spatial coupling. This new phenomenon is characterized by the belief-propagation threshold of the spatial…
▽ More
Low-density parity-check (LDPC) convolutional codes (or spatially-coupled codes) were recently shown to approach capacity on the binary erasure channel (BEC) and binary-input memoryless symmetric channels. The mechanism behind this spectacular performance is now called threshold saturation via spatial coupling. This new phenomenon is characterized by the belief-propagation threshold of the spatially-coupled ensemble increasing to an intrinsic noise threshold defined by the uncoupled system. In this paper, we present a simple proof of threshold saturation that applies to a wide class of coupled scalar recursions. Our approach is based on constructing potential functions for both the coupled and uncoupled recursions. Our results actually show that the fixed point of the coupled recursion is essentially determined by the minimum of the uncoupled potential function and we refer to this phenomenon as Maxwell saturation. A variety of examples are considered including the density-evolution equations for: irregular LDPC codes on the BEC, irregular low-density generator matrix codes on the BEC, a class of generalized LDPC codes with BCH component codes, the joint iterative decoding of LDPC codes on intersymbol-interference channels with erasure noise, and the compressed sensing of random vectors with i.i.d. components.
△ Less
Submitted 11 September, 2014; v1 submitted 30 September, 2013;
originally announced September 2013.
-
A Simple Proof of Threshold Saturation for Coupled Vector Recursions
Authors:
Arvind Yedla,
Yung-Yih Jian,
Phong S. Nguyen,
Henry D. Pfister
Abstract:
Convolutional low-density parity-check (LDPC) codes (or spatially-coupled codes) have now been shown to achieve capacity on binary-input memoryless symmetric channels. The principle behind this surprising result is the threshold-saturation phenomenon, which is defined by the belief-propagation threshold of the spatially-coupled ensemble saturating to a fundamental threshold defined by the uncouple…
▽ More
Convolutional low-density parity-check (LDPC) codes (or spatially-coupled codes) have now been shown to achieve capacity on binary-input memoryless symmetric channels. The principle behind this surprising result is the threshold-saturation phenomenon, which is defined by the belief-propagation threshold of the spatially-coupled ensemble saturating to a fundamental threshold defined by the uncoupled system.
Previously, the authors demonstrated that potential functions can be used to provide a simple proof of threshold saturation for coupled scalar recursions. In this paper, we present a simple proof of threshold saturation that applies to a wide class of coupled vector recursions. The conditions of the theorem are verified for the density-evolution equations of: (i) joint decoding of irregular LDPC codes for a Slepian-Wolf problem with erasures, (ii) joint decoding of irregular LDPC codes on an erasure multiple-access channel, and (iii) general protograph codes on the BEC. This proves threshold saturation for these systems.
△ Less
Submitted 24 January, 2013; v1 submitted 20 August, 2012;
originally announced August 2012.
-
A Simple Proof of Threshold Saturation for Coupled Scalar Recursions
Authors:
Arvind Yedla,
Yung-Yih Jian,
Phong S. Nguyen,
Henry D. Pfister
Abstract:
Low-density parity-check (LDPC) convolutional codes (or spatially-coupled codes) have been shown to approach capacity on the binary erasure channel (BEC) and binary-input memoryless symmetric channels. The mechanism behind this spectacular performance is the threshold saturation phenomenon, which is characterized by the belief-propagation threshold of the spatially-coupled ensemble increasing to a…
▽ More
Low-density parity-check (LDPC) convolutional codes (or spatially-coupled codes) have been shown to approach capacity on the binary erasure channel (BEC) and binary-input memoryless symmetric channels. The mechanism behind this spectacular performance is the threshold saturation phenomenon, which is characterized by the belief-propagation threshold of the spatially-coupled ensemble increasing to an intrinsic noise threshold defined by the uncoupled system.
In this paper, we present a simple proof of threshold saturation that applies to a broad class of coupled scalar recursions. The conditions of the theorem are verified for the density-evolution (DE) equations of irregular LDPC codes on the BEC, a class of generalized LDPC codes, and the joint iterative decoding of LDPC codes on intersymbol-interference channels with erasure noise. Our approach is based on potential functions and was motivated mainly by the ideas of Takeuchi et al. The resulting proof is surprisingly simple when compared to previous methods.
△ Less
Submitted 19 October, 2013; v1 submitted 25 April, 2012;
originally announced April 2012.
-
Approaching Capacity at High-Rates with Iterative Hard-Decision Decoding
Authors:
Yung-Yih Jian,
Henry D. Pfister,
Krishna R. Narayanan
Abstract:
A variety of low-density parity-check (LDPC) ensembles have now been observed to approach capacity with message-passing decoding. However, all of them use soft (i.e., non-binary) messages and a posteriori probability (APP) decoding of their component codes. In this paper, we show that one can approach capacity at high rates using iterative hard-decision decoding (HDD) of generalized product codes.…
▽ More
A variety of low-density parity-check (LDPC) ensembles have now been observed to approach capacity with message-passing decoding. However, all of them use soft (i.e., non-binary) messages and a posteriori probability (APP) decoding of their component codes. In this paper, we show that one can approach capacity at high rates using iterative hard-decision decoding (HDD) of generalized product codes. Specifically, a class of spatially-coupled GLDPC codes with BCH component codes is considered, and it is observed that, in the high-rate regime, they can approach capacity under the proposed iterative HDD. These codes can be seen as generalized product codes and are closely related to braided block codes. An iterative HDD algorithm is proposed that enables one to analyze the performance of these codes via density evolution (DE).
△ Less
Submitted 17 May, 2017; v1 submitted 27 February, 2012;
originally announced February 2012.
-
Convergence of Weighted Min-Sum Decoding Via Dynamic Programming on Trees
Authors:
Yung-Yih Jian,
Henry D. Pfister
Abstract:
Applying the max-product (and belief-propagation) algorithms to loopy graphs is now quite popular for best assignment problems. This is largely due to their low computational complexity and impressive performance in practice. Still, there is no general understanding of the conditions required for convergence and/or the optimality of converged solutions. This paper presents an analysis of both atte…
▽ More
Applying the max-product (and belief-propagation) algorithms to loopy graphs is now quite popular for best assignment problems. This is largely due to their low computational complexity and impressive performance in practice. Still, there is no general understanding of the conditions required for convergence and/or the optimality of converged solutions. This paper presents an analysis of both attenuated max-product (AMP) decoding and weighted min-sum (WMS) decoding for LDPC codes which guarantees convergence to a fixed point when a weight parameter, β, is sufficiently small. It also shows that, if the fixed point satisfies some consistency conditions, then it must be both the linear-programming (LP) and maximum-likelihood (ML) solution.
For (dv,dc)-regular LDPC codes, the weight must satisfy β(dv-1) \leq 1 whereas the results proposed by Frey and Koetter require instead that β(dv-1)(dc-1) < 1. A counterexample which shows a fixed point might not be the ML solution if β(dv-1) > 1 is also given. Finally, connections are explored with recent work by Arora et al. on the threshold of LP decoding.
△ Less
Submitted 15 July, 2011;
originally announced July 2011.
-
Multimedia Description Framework (MDF) for Content Description of Audio/Video Documents
Authors:
Michael J. Hu,
Ye Jian
Abstract:
MPEG is undertaking a new initiative to standardize content description of audio and video data/documents. When it is finalized in 2001, MPEG-7 is expected to provide standardized description schemes for concise and unambiguous content description of data/documents of complex media types. Meanwhile, other meta-data or description schemes, such as Dublin Core, XML, etc., are becoming popular in d…
▽ More
MPEG is undertaking a new initiative to standardize content description of audio and video data/documents. When it is finalized in 2001, MPEG-7 is expected to provide standardized description schemes for concise and unambiguous content description of data/documents of complex media types. Meanwhile, other meta-data or description schemes, such as Dublin Core, XML, etc., are becoming popular in different application domains. In this paper, we propose the Multimedia Description Framework (MDF), which is designated to accommodate multiple description (meta-data) schemes, both MPEG-7 and non-MPEG-7, into integrated architecture. We will use examples to show how MDF description makes use of combined strength of different description schemes to enhance its expression power and flexibility. We conclude the paper with discussion of using MDF description of a movie video to search/retrieve required scene clips from the movie, on the MDF prototype system we have implemented.
△ Less
Submitted 8 February, 1999;
originally announced February 1999.