Search | arXiv e-print repository

Autonomous Ground Navigation in Highly Constrained Spaces: Lessons learned from The 3rd BARN Challenge at ICRA 2024

Authors: Xuesu Xiao, Zifan Xu, Aniket Datar, Garrett Warnell, Peter Stone, Joshua Julian Damanik, Jaewon Jung, Chala Adane Deresa, Than Duc Huy, Chen **yu, Chen Yichen, Joshua Adrian Cahyono, **gda Wu, Longfei Mo, Mingyang Lv, Bowen Lan, Qingyang Meng, Weizhi Tao, Li Cheng

Abstract: The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphi… ▽ More The 3rd BARN (Benchmark Autonomous Robot Navigation) Challenge took place at the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) in Yokohama, Japan and continued to evaluate the performance of state-of-the-art autonomous ground navigation systems in highly constrained environments. Similar to the trend in The 1st and 2nd BARN Challenge at ICRA 2022 and 2023 in Philadelphia (North America) and London (Europe), The 3rd BARN Challenge in Yokohama (Asia) became more regional, i.e., mostly Asian teams participated. The size of the competition has slightly shrunk (six simulation teams, four of which were invited to the physical competition). The competition results, compared to last two years, suggest that the field has adopted new machine learning approaches while at the same time slightly converged to a few common practices. However, the regional nature of the physical participants suggests a challenge to promote wider participation all over the world and provide more resources to travel to the venue. In this article, we discuss the challenge, the approaches used by the three winning teams, and lessons learned to direct future research and competitions. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.03205

arXiv:2405.05616 [pdf, other]

G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

Abstract: Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability… ▽ More Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability. Some studies have explored combining LMs with Knowledge Graphs(KGs) by coarsely fusing the two modalities to perform Graph Neural Network(GNN)-based reasoning that lacks a profound interaction between heterogeneous modalities. In this paper, we propose a novel Graph-based Structure-Aware Prompt Learning Model for commonsense reasoning, named G-SAP, aiming to maintain a balance between heterogeneous knowledge and enhance the cross-modal interaction within the LM+GNNs model. In particular, an evidence graph is constructed by integrating multiple knowledge sources, i.e. ConceptNet, Wikipedia, and Cambridge Dictionary to boost the performance. Afterward, a structure-aware frozen PLM is employed to fully incorporate the structured and textual information from the evidence graph, where the generation of prompts is driven by graph entities and relations. Finally, a heterogeneous message-passing reasoning module is used to facilitate deep interaction of knowledge between the LM and graph-based networks. Empirical validation, conducted through extensive experiments on three benchmark datasets, demonstrates the notable performance of the proposed model. The results reveal a significant advancement over the existing models, especially, with 6.12% improvement over the SoTA LM+GNNs model on the OpenbookQA dataset. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2403.19347 [pdf, other]

Breaking the Length Barrier: LLM-Enhanced CTR Prediction in Long Textual User Behaviors

Authors: Binzong Geng, Zhaoxin Huan, Xiaolu Zhang, Yong He, Liang Zhang, Fajie Yuan, Jun Zhou, Linjian Mo

Abstract: With the rise of large language models (LLMs), recent works have leveraged LLMs to improve the performance of click-through rate (CTR) prediction. However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors. As user sequences grow longer, the current efficiency of LLMs is inadequate for training on billi… ▽ More With the rise of large language models (LLMs), recent works have leveraged LLMs to improve the performance of click-through rate (CTR) prediction. However, we argue that a critical obstacle remains in deploying LLMs for practical use: the efficiency of LLMs when processing long textual user behaviors. As user sequences grow longer, the current efficiency of LLMs is inadequate for training on billions of users and items. To break through the efficiency barrier of LLMs, we propose Behavior Aggregated Hierarchical Encoding (BAHE) to enhance the efficiency of LLM-based CTR modeling. Specifically, BAHE proposes a novel hierarchical architecture that decouples the encoding of user behaviors from inter-behavior interactions. Firstly, to prevent computational redundancy from repeated encoding of identical user behaviors, BAHE employs the LLM's pre-trained shallow layers to extract embeddings of the most granular, atomic user behaviors from extensive user sequences and stores them in the offline database. Subsequently, the deeper, trainable layers of the LLM facilitate intricate inter-behavior interactions, thereby generating comprehensive user embeddings. This separation allows the learning of high-level user representations to be independent of low-level behavior encoding, significantly reducing computational complexity. Finally, these refined user embeddings, in conjunction with correspondingly processed item embeddings, are incorporated into the CTR model to compute the CTR scores. Extensive experimental results show that BAHE reduces training time and memory by five times for CTR models using LLMs, especially with longer user sequences. BAHE has been deployed in a real-world system, allowing for daily updates of 50 million CTR data on 8 A100 GPUs, making LLMs practical for industrial CTR prediction. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted by the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024

arXiv:2402.11676 [pdf, other]

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

Authors: Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun

Abstract: Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter na… ▽ More Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation. △ Less

Submitted 29 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: 22 pages, camera-ready version; references added, typos corrected, methodology section expanded, additional table

arXiv:2402.10196 [pdf, other]

A Trembling House of Cards? Map** Adversarial Attacks against Language Agents

Authors: Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun

Abstract: Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment,… ▽ More Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capitalized on this capability to connect LLMs to a wide range of external components and environments: databases, tools, the Internet, robotic embodiment, etc. Many believe an unprecedentedly powerful automation technology is emerging. However, new automation technologies come with new safety risks, especially for intricate systems like language agents. There is a surprisingly large gap between the speed and scale of their development and deployment and our understanding of their safety risks. Are we building a house of cards? In this position paper, we present the first systematic effort in map** adversarial attacks against language agents. We first present a unified conceptual framework for agents with three major components: Perception, Brain, and Action. Under this framework, we present a comprehensive discussion and propose 12 potential attack scenarios against different components of an agent, covering different attack strategies (e.g., input manipulation, adversarial demonstrations, jailbreaking, backdoors). We also draw connections to successful attack strategies previously applied to LLMs. We emphasize the urgency to gain a thorough understanding of language agent risks before their widespread deployment. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2401.09775 [pdf, other]

Controllable Decontextualization of Yes/No Question and Answers into Factual Statements

Authors: Lingbo Mo, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Abstract: Yes/No or polar questions represent one of the main linguistic question categories. They consist of a main interrogative clause, for which the answer is binary (assertion or negation). Polar questions and answers (PQA) represent a valuable knowledge resource present in many community and other curated QA sources, such as forums or e-commerce applications. Using answers to polar questions alone in… ▽ More Yes/No or polar questions represent one of the main linguistic question categories. They consist of a main interrogative clause, for which the answer is binary (assertion or negation). Polar questions and answers (PQA) represent a valuable knowledge resource present in many community and other curated QA sources, such as forums or e-commerce applications. Using answers to polar questions alone in other contexts is not trivial. Answers are contextualized, and presume that the interrogative question clause and any shared knowledge between the asker and answerer are provided. We address the problem of controllable rewriting of answers to polar questions into decontextualized and succinct factual statements. We propose a Transformer sequence to sequence model that utilizes soft-constraints to ensure controllable rewriting, such that the output statement is semantically equivalent to its PQA input. Evaluation on three separate PQA datasets as measured through automated and human evaluation metrics show that our proposed approach achieves the best performance when compared to existing baselines. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: Accepted at ECIR 2024

arXiv:2311.09447 [pdf, other]

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities

Authors: Lingbo Mo, Boshi Wang, Muhao Chen, Huan Sun

Abstract: The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward. However, there is still a limited understanding of their trustworthiness. Deploying these models at scale without sufficient trustworthiness can pose significant risks, highlighting the need to uncover these issues promptly. In this work, we conduct an adversarial assessment of open-sour… ▽ More The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward. However, there is still a limited understanding of their trustworthiness. Deploying these models at scale without sufficient trustworthiness can pose significant risks, highlighting the need to uncover these issues promptly. In this work, we conduct an adversarial assessment of open-source LLMs on trustworthiness, scrutinizing them across eight different aspects including toxicity, stereotypes, ethics, hallucination, fairness, sycophancy, privacy, and robustness against adversarial demonstrations. We propose advCoU, an extended Chain of Utterances-based (CoU) prompting strategy by incorporating carefully crafted malicious demonstrations for trustworthiness attack. Our extensive experiments encompass recent and representative series of open-source LLMs, including Vicuna, MPT, Falcon, Mistral, and Llama 2. The empirical outcomes underscore the efficacy of our attack strategy across diverse aspects. More interestingly, our result analysis reveals that models with superior performance in general NLP tasks do not always have greater trustworthiness; in fact, larger models can be more vulnerable to attacks. Additionally, models that have undergone instruction tuning, focusing on instruction following, tend to be more susceptible, although fine-tuning LLMs for safety alignment proves effective in mitigating adversarial trustworthiness attacks. △ Less

Submitted 2 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: NAACL 2024

arXiv:2308.16437 [pdf, other]

AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

Authors: Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, **jie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang

Abstract: Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existin… ▽ More Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2307.16081 [pdf, other]

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

Authors: Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

Abstract: We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, Ta… ▽ More We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps. Covering a wide range of cooking and how-to tasks, we aim to deliver a collaborative and engaging dialogue experience. Equipped with language understanding, dialogue management, and response generation components supported by a robust search engine, TacoBot ensures efficient task assistance. To enhance the dialogue experience, we explore a series of data augmentation strategies using LLMs to train advanced neural models continuously. TacoBot builds upon our successful participation in the inaugural Alexa Prize TaskBot Challenge, where our team secured third place among ten competing teams. We offer TacoBot as an open-source framework that serves as a practical example for deploying task-oriented dialogue systems. △ Less

Submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.10229 [pdf, other]

Study on the Impacts of Hazardous Behaviors on Autonomous Vehicle Collision Rates Based on Humanoid Scenario Generation in CARLA

Authors: Longfei Mo, Min Hua, Hongyu Sun, Hongming Xu, Bin Shuai, Quan Zhou

Abstract: Testing of function safety and Safety Of The Intended Functionality (SOTIF) is important for autonomous vehicles (AVs). It is hard to test the AV's hazard response in the real world because it would involve hazards to passengers and other road users. This paper studied on virtual testing of AV on the CARLA platform and proposed a Humanoid Scenario Generation (HSG) scheme to investigate the impacts… ▽ More Testing of function safety and Safety Of The Intended Functionality (SOTIF) is important for autonomous vehicles (AVs). It is hard to test the AV's hazard response in the real world because it would involve hazards to passengers and other road users. This paper studied on virtual testing of AV on the CARLA platform and proposed a Humanoid Scenario Generation (HSG) scheme to investigate the impacts of hazardous behaviors on AV collision rates. The HSG scheme breakthrough the current limitation on the rarity and reproducibility of real scenes. By accurately capturing five prominent human driver behaviors that directly contribute to vehicle collisions in the real world, the methodology significantly enhances the realism and diversity of the simulation, as evidenced by collision rate statistics across various traffic scenarios. Thus, the modular framework allows for customization, and its seamless integration within the CARLA platform ensures compatibility with existing tools. Ultimately, the comparison results demonstrate that all vehicles that exhibited hazardous behaviors followed the predefined random speed distribution and the effectiveness of the HSG was validated by the distinct characteristics displayed by these behaviors. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2306.10012 [pdf, other]

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

Authors: Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su

Abstract: Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce Magi… ▽ More Text-guided image editing is widely needed in daily life, ranging from personal use to professional applications such as Photoshop. However, existing methods are either zero-shot or trained on an automatically synthesized dataset, which contains a high volume of noise. Thus, they still require lots of manual tuning to produce desirable outcomes in practice. To address this issue, we introduce MagicBrush (https://osu-nlp-group.github.io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing. MagicBrush comprises over 10K manually annotated triplets (source image, instruction, target image), which supports trainining large-scale text-guided image editing models. We fine-tune InstructPix2Pix on MagicBrush and show that the new model can produce much better images according to human evaluation. We further conduct extensive experiments to evaluate current image editing baselines from multiple dimensions including quantitative, qualitative, and human evaluations. The results reveal the challenging nature of our dataset and the gap between current baselines and real-world editing needs. △ Less

Submitted 15 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023; Website: https://osu-nlp-group.github.io/MagicBrush/

arXiv:2306.07935 [pdf, other]

Multi-modal Representation Learning for Social Post Location Inference

Authors: Ruiting Dai, Jiayi Luo, Xucheng Luo, Lisi Mo, Wanlun Ma, Fan Zhou

Abstract: Inferring geographic locations via social posts is essential for many practical location-based applications such as product marketing, point-of-interest recommendation, and infector tracking for COVID-19. Unlike image-based location retrieval or social-post text embedding-based location inference, the combined effect of multi-modal information (i.e., post images, text, and hashtags) for social pos… ▽ More Inferring geographic locations via social posts is essential for many practical location-based applications such as product marketing, point-of-interest recommendation, and infector tracking for COVID-19. Unlike image-based location retrieval or social-post text embedding-based location inference, the combined effect of multi-modal information (i.e., post images, text, and hashtags) for social post positioning receives less attention. In this work, we collect real datasets of social posts with images, texts, and hashtags from Instagram and propose a novel Multi-modal Representation Learning Framework (MRLF) capable of fusing different modalities of social posts for location inference. MRLF integrates a multi-head attention mechanism to enhance location-salient information extraction while significantly improving location inference compared with single domain-based methods. To overcome the noisy user-generated textual content, we introduce a novel attention-based character-aware module that considers the relative dependencies between characters of social post texts and hashtags for flexible multi-model information fusion. The experimental results show that MRLF can make accurate location predictions and open a new door to understanding the multi-modal data of social posts for online inference tasks. △ Less

Submitted 10 June, 2023; originally announced June 2023.

Comments: 6 pages, 2023 International Conference on Communications

arXiv:2303.04574 [pdf, other]

Distributed and Deep Vertical Federated Learning with Big Data

Authors: Ji Liu, Xuehai Zhou, Lei Mo, Shilei Ji, Yuan Liao, Zheng Li, Qin Gu, De**g Dou

Abstract: In recent years, data are typically distributed in multiple organizations while the data security is becoming increasingly important. Federated Learning (FL), which enables multiple parties to collaboratively train a model without exchanging the raw data, has attracted more and more attention. Based on the distribution of data, FL can be realized in three scenarios, i.e., horizontal, vertical, and… ▽ More In recent years, data are typically distributed in multiple organizations while the data security is becoming increasingly important. Federated Learning (FL), which enables multiple parties to collaboratively train a model without exchanging the raw data, has attracted more and more attention. Based on the distribution of data, FL can be realized in three scenarios, i.e., horizontal, vertical, and hybrid. In this paper, we propose to combine distributed machine learning techniques with Vertical FL and propose a Distributed Vertical Federated Learning (DVFL) approach. The DVFL approach exploits a fully distributed architecture within each party in order to accelerate the training process. In addition, we exploit Homomorphic Encryption (HE) to protect the data against honest-but-curious participants. We conduct extensive experimentation in a large-scale cluster environment and a cloud environment in order to show the efficiency and scalability of our proposed approach. The experiments demonstrate the good scalability of our approach and the significant efficiency advantage (up to 6.8 times with a single server and 15.1 times with multiple servers in terms of the training time) compared with baseline frameworks. △ Less

Submitted 10 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: To appear in CCPE (Concurrency and Computation: Practice and Experience)

arXiv:2301.13465 [pdf, other]

doi 10.1145/3511808.3557333

GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

Authors: Xin Dong, Ruize Wu, Chao Xiong, Hai Li, Lei Cheng, Yong He, Shiyou Qian, Jian Cao, Linjian Mo

Abstract: Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient… ▽ More Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics. △ Less

Submitted 31 January, 2023; originally announced January 2023.

Journal ref: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 2022: 386-395

arXiv:2207.05223 [pdf, other]

Bootstrap** a User-Centered Task-Oriented Dialogue System

Authors: Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

Abstract: We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks. TacoBot is designed with a user-centered principle and aspires to deliver a collaborative and accessible dialogue experience. Towards that end, it is equipped with accurate language understanding, flexible dialog… ▽ More We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks. TacoBot is designed with a user-centered principle and aspires to deliver a collaborative and accessible dialogue experience. Towards that end, it is equipped with accurate language understanding, flexible dialogue management, and engaging response generation. Furthermore, TacoBot is backed by a strong search engine and an automated end-to-end test suite. In bootstrap** the development of TacoBot, we explore a series of data augmentation strategies to train advanced neural language processing models and continuously improve the dialogue experience with collected real conversations. At the end of the semifinals, TacoBot achieved an average rating of 3.55/5.0. △ Less

Submitted 21 July, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: Published in 1st Proceedings of Alexa Prize TaskBot (Alexa Prize 2021). TacoBot won 3rd place in the challenge. See project website https://sunlab-osu.github.io/tacobot/ for details

arXiv:2112.07980 [pdf, other]

Data Placement for Multi-Tenant Data Federation on the Cloud

Authors: Ji Liu, Lei Mo, Sijia Yang, **gbo Zhou, Shilei Ji, Haoyi Xiong, De**g Dou

Abstract: Due to privacy concerns of users and law enforcement in data security and privacy, it becomes more and more difficult to share data among organizations. Data federation brings new opportunities to the data-related cooperation among organizations by providing abstract data interfaces. With the development of cloud computing, organizations store data on the cloud to achieve elasticity and scalabilit… ▽ More Due to privacy concerns of users and law enforcement in data security and privacy, it becomes more and more difficult to share data among organizations. Data federation brings new opportunities to the data-related cooperation among organizations by providing abstract data interfaces. With the development of cloud computing, organizations store data on the cloud to achieve elasticity and scalability for data processing. The existing data placement approaches generally only consider one aspect, which is either execution time or monetary cost, and do not consider data partitioning for hard constraints. In this paper, we propose an approach to enable data processing on the cloud with the data from different organizations. The approach consists of a data federation platform named FedCube and a Lyapunov-based data placement algorithm. FedCube enables data processing on the cloud. We use the data placement algorithm to create a plan in order to partition and store data on the cloud so as to achieve multiple objectives while satisfying the constraints based on a multi-objective cost model. The cost model is composed of two objectives, i.e., reducing monetary cost and execution time. We present an experimental evaluation to show our proposed algorithm significantly reduces the total cost (up to 69.8\%) compared with existing approaches. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 15 pages, 8 figures, 4 tables

arXiv:2110.08345 [pdf, other]

Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction

Authors: Lingbo Mo, Ashley Lewis, Huan Sun, Michael White

Abstract: Existing studies on semantic parsing focus primarily on map** a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural langua… ▽ More Existing studies on semantic parsing focus primarily on map** a natural-language utterance to a corresponding logical form in one turn. However, because natural language can contain a great deal of ambiguity and variability, this is a difficult challenge. In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps. We focus on question answering over knowledge bases (KBQA) as an instantiation of our framework, aiming to increase the transparency of the parsing process and help the user appropriately trust the final answer. To do so, we construct INSPIRED, a crowdsourced dialogue dataset derived from the ComplexWebQuestions dataset. Our experiments show that the interactive framework with human feedback has the potential to greatly improve overall parse accuracy. Furthermore, we develop a pipeline for dialogue simulation to evaluate our framework w.r.t. a variety of state-of-the-art KBQA models without involving further crowdsourcing effort. The results demonstrate that our interactive semantic parsing framework promises to be effective across such models. △ Less

Submitted 27 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: Accepted by Findings of ACL 2022

arXiv:2104.06312 [pdf, other]

A Non-sequential Approach to Deep User Interest Model for CTR Prediction

Authors: Keke Zhao, Xing Zhao, Qi Cao, Linjian Mo

Abstract: Click-Through Rate (CTR) prediction plays an important role in many industrial applications, and recently a lot of attention is paid to the deep interest models which use attention mechanism to capture user interests from historical behaviors. However, most current models are based on sequential models which truncate the behavior sequences by a fixed length, thus have difficulties in handling very… ▽ More Click-Through Rate (CTR) prediction plays an important role in many industrial applications, and recently a lot of attention is paid to the deep interest models which use attention mechanism to capture user interests from historical behaviors. However, most current models are based on sequential models which truncate the behavior sequences by a fixed length, thus have difficulties in handling very long behavior sequences. Another big problem is that sequences with the same length can be quite different in terms of time, carrying completely different meanings. In this paper, we propose a non-sequential approach to tackle the above problems. Specifically, we first represent the behavior data in a sparse key-vector format, where the vector contains rich behavior info such as time, count and category. Next, we enhance the Deep Interest Network to take such rich information into account by a novel attention network. The sparse representation makes it practical to handle large scale long behavior sequences. Finally, we introduce a multidimensional partition framework to mine behavior interactions. The framework can partition data into custom designed time buckets to capture the interactions among information aggregated in different time buckets. Similarly, it can also partition the data into different categories and capture the interactions among them. Experiments are conducted on two public datasets: one is an advertising dataset and the other is a production recommender dataset. Our models outperform other state-of-the-art models on both datasets. △ Less

Submitted 21 May, 2021; v1 submitted 5 April, 2021; originally announced April 2021.

arXiv:2003.00389 [pdf, other]

Joint Wasserstein Distribution Matching

Authors: JieZhang Cao, Langyuan Mo, Qing Du, Yong Guo, Peilin Zhao, Junzhou Huang, Mingkui Tan

Abstract: Joint distribution matching (JDM) problem, which aims to learn bidirectional map**s to match joint distributions of two domains, occurs in many machine learning and computer vision applications. This problem, however, is very difficult due to two critical challenges: (i) it is often difficult to exploit sufficient information from the joint distribution to conduct the matching; (ii) this problem… ▽ More Joint distribution matching (JDM) problem, which aims to learn bidirectional map**s to match joint distributions of two domains, occurs in many machine learning and computer vision applications. This problem, however, is very difficult due to two critical challenges: (i) it is often difficult to exploit sufficient information from the joint distribution to conduct the matching; (ii) this problem is hard to formulate and optimize. In this paper, relying on optimal transport theory, we propose to address JDM problem by minimizing the Wasserstein distance of the joint distributions in two domains. However, the resultant optimization problem is still intractable. We then propose an important theorem to reduce the intractable problem into a simple optimization problem, and develop a novel method (called Joint Wasserstein Distribution Matching (JWDM)) to solve it. In the experiments, we apply our method to unsupervised image translation and cross-domain video synthesis. Both qualitative and quantitative comparisons demonstrate the superior performance of our method over several state-of-the-arts. △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: This paper is accepted by Chinese Journal of Computers in 2020

arXiv:1911.00888 [pdf, other]

Multi-marginal Wasserstein GAN

Authors: Jiezhang Cao, Langyuan Mo, Yifan Zhang, Kui Jia, Chunhua Shen, Mingkui Tan

Abstract: Multiple marginal matching problem aims at learning map**s to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation. However, addressing this problem has two critical challenges: (i) Measuring the multi-marginal distance among different domains is very intractable; (ii) It is very difficult to exploit cr… ▽ More Multiple marginal matching problem aims at learning map**s to match a source domain to multiple target domains and it has attracted great attention in many applications, such as multi-domain image translation. However, addressing this problem has two critical challenges: (i) Measuring the multi-marginal distance among different domains is very intractable; (ii) It is very difficult to exploit cross-domain correlations to match the target domain distributions. In this paper, we propose a novel Multi-marginal Wasserstein GAN (MWGAN) to minimize Wasserstein distance among domains. Specifically, with the help of multi-marginal optimal transport theory, we develop a new adversarial objective function with inner- and inter-domain constraints to exploit cross-domain correlations. Moreover, we theoretically analyze the generalization performance of MWGAN, and empirically evaluate it on the balanced and imbalanced translation tasks. Extensive experiments on toy and real-world datasets demonstrate the effectiveness of MWGAN. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: This paper is accepted by NeurIPS 2019

arXiv:1905.01046 [pdf]

Inter-Cell Antenna Calibration for Coherent Joint Transmission in TDD System

Authors: Hanqing Xu, Yajun Zhao, Linmei Mo, Chen Huang, Baoyu Sun

Abstract: In this work the modeling and calibration method of reciprocity error in a coherent TDD coordinated multi-point (CoMP) joint transmission (JT) system are addressed. The modeling includes parameters such as amplitude gains and phase differences of RF chains between the eNBs. The calibration method used for inter-cell antenna calibration is based on precoding matrix indicator (PMI) feedback by UE. F… ▽ More In this work the modeling and calibration method of reciprocity error in a coherent TDD coordinated multi-point (CoMP) joint transmission (JT) system are addressed. The modeling includes parameters such as amplitude gains and phase differences of RF chains between the eNBs. The calibration method used for inter-cell antenna calibration is based on precoding matrix indicator (PMI) feedback by UE. Furthermore, we provide some simulation results for evaluating the performance of the calibration method in different cases such as varying estimation-period, cell-specific reference signals (CRS) ports configuration, signal to noise ratio (SNR), phase difference, etc. The main conclusion is that the proposed method for intercell antenna calibration has good performance for estimating the residual phase difference. Keywords-LTE-Advanced; TDD; CoMP; JT; reciprocity error; phase difference; inter-cell antenna calibration △ Less

Submitted 3 May, 2019; originally announced May 2019.

Comments: 5 pages, 8 figures, GLOBECOM

arXiv:1807.06454 [pdf, other]

Global sensitivity analysis of frequency band gaps in one-dimensional phononic crystals

Authors: W. Witarto, K. B. Nakshatrala, Y. L. Mo

Abstract: Phononic crystals have been widely employed in many engineering fields, which is due to their unique feature of frequency band gaps. For example, their capability to filter out the incoming elastic waves, which include seismic waves, will have a significant impact on the seismic safety of nuclear infrastructure. In order to accurately design the desired frequency band gaps, one must pay attention… ▽ More Phononic crystals have been widely employed in many engineering fields, which is due to their unique feature of frequency band gaps. For example, their capability to filter out the incoming elastic waves, which include seismic waves, will have a significant impact on the seismic safety of nuclear infrastructure. In order to accurately design the desired frequency band gaps, one must pay attention on how the input parameters and the interaction of the parameters can affect the frequency band gaps. Global sensitivity analysis can decompose the dispersion relationship of the phononic crystals and screen the variance attributed to each of the parameters and the interaction between them. Prior to the application in one-dimensional (1D) phononic crystals, this paper will first review the theory of global sensitivity analysis using variance decomposition (Sobol sensitivity analysis). Afterwards, the sensitivity analysis is applied to study a simple mathematical model with three input variables for better understanding of the concept. Then, the sensitivity analysis is utilized to study the characteristic of the first frequency band gap in 1D phononic crystals with respect to the input parameters. This study reveals the quantified influence of the parameters and their correlation in determining the first frequency band gap. In addition, simple straight-forward design equations based on reduced Sobol functions are proposed to easily estimate the first frequency band gap. Finally, the error associated with the proposed design equations is also addressed. △ Less

Submitted 29 December, 2018; v1 submitted 14 July, 2018; originally announced July 2018.

Comments: K. C. Chang, Y. Tang and R. Kassawara have contributed to the experimental component of the research project, which complements the research present in this paper. This paper deals only with the modeling component

arXiv:1506.01188 [pdf, other]

doi 10.1145/2783258.2783271

Online Influence Maximization (Extended Version)

Authors: Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, Pierre Senellart

Abstract: Social networks are commonly used for marketing purposes. For example, free samples of a product can be given to a few influential social network users (or "seed nodes"), with the hope that they will convince their friends to buy it. One way to formalize marketers' objective is through influence maximization (or IM), whose goal is to find the best seed nodes to activate under a fixed budget, so th… ▽ More Social networks are commonly used for marketing purposes. For example, free samples of a product can be given to a few influential social network users (or "seed nodes"), with the hope that they will convince their friends to buy it. One way to formalize marketers' objective is through influence maximization (or IM), whose goal is to find the best seed nodes to activate under a fixed budget, so that the number of people who get influenced in the end is maximized. Recent solutions to IM rely on the influence probability that a user influences another one. However, this probability information may be unavailable or incomplete. In this paper, we study IM in the absence of complete information on influence probability. We call this problem Online Influence Maximization (OIM) since we learn influence probabilities at the same time we run influence campaigns. To solve OIM, we propose a multiple-trial approach, where (1) some seed nodes are selected based on existing influence information; (2) an influence campaign is started with these seed nodes; and (3) users' feedback is used to update influence information. We adopt the Explore-Exploit strategy, which can select seed nodes using either the current influence probability estimation (exploit), or the confidence bound on the estimation (explore). Any existing IM algorithm can be used in this framework. We also develop an incremental algorithm that can significantly reduce the overhead of handling users' feedback information. Our experiments show that our solution is more effective than traditional IM methods on the partial information. △ Less

Submitted 3 June, 2015; originally announced June 2015.

Comments: 13 pages. To appear in KDD 2015. Extended version

Showing 1–23 of 23 results for author: Mo, L