Search | arXiv e-print repository

W2E (Workout to Earn): A Low Cost DApp based on ERC-20 and ERC-721 standards

Authors: Do Hai Son, Nguyen Danh Hao, Tran Thi Thuy Quynh, Le Quang Minh

Abstract: Decentralized applications (DApps) have gained prominence with the advent of blockchain technology, particularly Ethereum, providing trust, transparency, and traceability. However, challenges such as rising transaction costs and block confirmation delays hinder their widespread adoption. In this paper, we present our DApp named W2E - Workout to Earn, a mobile DApp incentivizing exercise through to… ▽ More Decentralized applications (DApps) have gained prominence with the advent of blockchain technology, particularly Ethereum, providing trust, transparency, and traceability. However, challenges such as rising transaction costs and block confirmation delays hinder their widespread adoption. In this paper, we present our DApp named W2E - Workout to Earn, a mobile DApp incentivizing exercise through tokens and NFT awards. This application leverages the well-known ERC-20 and ERC-721 token standards of Ethereum. Additionally, we deploy W2E into various Ethereum-based networks, including Ethereum testnets, Layer 2 networks, and private networks, to survey gas efficiency and execution time. Our findings highlight the importance of network selection for DApp deployment, offering insights for developers and businesses seeking efficient blockchain solutions. This is because our experimental results are not only specific for W2E but also for other ERC-20 and ERC-721-based DApps. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.12182 [pdf, other]

Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models

Authors: Lulu Zhao, Weihao Zeng, Xiaofeng Shi, Hua Zhou, Donglin Hao, Yonghua Lin

Abstract: Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressin… ▽ More Recently, both closed-source LLMs and open-source communities have made significant strides, outperforming humans in various general domains. However, their performance in specific professional fields such as medicine, especially within the open-source community, remains suboptimal due to the complexity of medical knowledge. We propose Aquila-Med, a bilingual medical LLM based on Aquila, addressing these challenges through continue pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF). We construct a large-scale Chinese and English medical dataset for continue pre-training and a high-quality SFT dataset, covering extensive medical specialties. Additionally, we develop a high-quality Direct Preference Optimization (DPO) dataset for further alignment. Aquila-Med achieves notable results across single-turn, multi-turn dialogues, and medical multiple-choice questions, demonstrating the effectiveness of our approach. We open-source the datasets and the entire training process, contributing valuable resources to the research community. Our models and datasets will released at https://huggingface.co/BAAI/AquilaMed-RL. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.20935 [pdf, other]

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Authors: Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

Abstract: The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two m… ▽ More The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two methods remains an open question. In this paper, we investigate the interaction between these two methods and assess whether their combination impacts final model accuracy. We mathematically prove that applying sparsity before quantization is the optimal sequence for these operations, minimizing error in computation. Our empirical studies across a wide range of models, including OPT and Llama model families (125M-8B) and ViT corroborate these theoretical findings. In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation. Our findings extend to the efficient deployment of large models in resource-limited compute platforms and reduce serving cost, offering insights into best practices for applying these compression methods to maximize efficacy without compromising accuracy. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2404.17839 [pdf, other]

Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

Authors: Yizhou Chen, Zeyu Sun, Zhihao Gong, Dan Hao

Abstract: Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disr… ▽ More Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disregard the correlation between contracts, failing to consider the commonalities between contracts of the same type and the differences among contracts of different types. As a result, the performance of these methods falls short of the desired level. To tackle this problem, we propose a novel Contrastive Learning Enhanced Automated Recognition Approach for Smart Contract Vulnerabilities, named Clear. In particular, Clear employs a contrastive learning (CL) model to capture the fine-grained correlation information among contracts and generates correlation labels based on the relationships between contracts to guide the training process of the CL model. Finally, it combines the correlation and the semantic information of the contract to detect SCVs. Through an empirical evaluation of a large-scale real-world dataset of over 40K smart contracts and compare 13 state-of-the-art baseline methods. We show that Clear achieves (1) optimal performance over all baseline methods; (2) 9.73%-39.99% higher F1-score than existing deep learning methods. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Journal ref: 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE '24)

arXiv:2404.13947 [pdf, other]

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

Authors: Dongze Hao, Qunbo Wang, Longteng Guo, Jie Jiang, **g Liu

Abstract: While large pre-trained visual-language models have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world knowledge. Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowl… ▽ More While large pre-trained visual-language models have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world knowledge. Motivated by the research of retrieval-augmented generation in the field of natural language processing, we use Dense Passage Retrieval (DPR) to retrieve related knowledge to help the model answer questions. However, DPR conduct retrieving in natural language space, which may not ensure comprehensive acquisition of image information. Thus, the retrieved knowledge is not truly conducive to hel** answer the question, affecting the performance of the overall system. To address this issue, we propose a novel framework that leverages the visual-language model to select the key knowledge retrieved by DPR and answer questions. The framework consists of two modules: Selector and Answerer, where both are initialized by the MLLM and parameter-efficiently finetuned by self-bootstrap**: find key knowledge in the retrieved knowledge documents using the Selector, and then use them to finetune the Answerer to predict answers; obtain the pseudo-labels of key knowledge documents based on the predictions of the Answerer and weak supervision labels, and then finetune the Selector to select key knowledge; repeat. Our framework significantly enhances the performance of the baseline on the challenging open-domain Knowledge-based VQA benchmark, OK-VQA, achieving a state-of-the-art accuracy of 62.83\%. △ Less

Submitted 16 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.11816 [pdf, other]

Tailoring Generative Adversarial Networks for Smooth Airfoil Design

Authors: Joyjit Chattoraj, Jian Cheng Wong, Zhang Zexuan, Manna Dai, Xia Yingzhi, Li Jichao, Xu Xinxing, Ooi Chin Chun, Yang Feng, Dao My Ha, Liu Yong

Abstract: In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we prese… ▽ More In the realm of aerospace design, achieving smooth curves is paramount, particularly when crafting objects such as airfoils. Generative Adversarial Network (GAN), a widely employed generative AI technique, has proven instrumental in synthesizing airfoil designs. However, a common limitation of GAN is the inherent lack of smoothness in the generated airfoil surfaces. To address this issue, we present a GAN model featuring a customized loss function built to produce seamlessly contoured airfoil designs. Additionally, our model demonstrates a substantial increase in design diversity compared to a conventional GAN augmented with a post-processing smoothing filter. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.17601 [pdf, other]

LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation

Authors: Ke Guo, Zhenwei Miao, Wei **g, Weiwei Liu, Weizi Li, Dayang Hao, Jia Pan

Abstract: Microscopic traffic simulation plays a crucial role in transportation engineering by providing insights into individual vehicle behavior and overall traffic flow. However, creating a realistic simulator that accurately replicates human driving behaviors in various traffic conditions presents significant challenges. Traditional simulators relying on heuristic models often fail to deliver accurate s… ▽ More Microscopic traffic simulation plays a crucial role in transportation engineering by providing insights into individual vehicle behavior and overall traffic flow. However, creating a realistic simulator that accurately replicates human driving behaviors in various traffic conditions presents significant challenges. Traditional simulators relying on heuristic models often fail to deliver accurate simulations due to the complexity of real-world traffic environments. Due to the covariate shift issue, existing imitation learning-based simulators often fail to generate stable long-term simulations. In this paper, we propose a novel approach called learner-aware supervised imitation learning to address the covariate shift problem in multi-agent imitation learning. By leveraging a variational autoencoder simultaneously modeling the expert and learner state distribution, our approach augments expert states such that the augmented state is aware of learner state distribution. Our method, applied to urban traffic simulation, demonstrates significant improvements over existing state-of-the-art baselines in both short-term microscopic and long-term macroscopic realism when evaluated on the real-world dataset pNEUMA. △ Less

Submitted 23 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024. arXiv admin note: text overlap with arXiv:2306.06401

arXiv:2403.13187 [pdf, other]

Evolutionary Optimization of Model Merging Recipes

Authors: Takuya Akiba, Makoto Shing, Yu** Tang, Qi Sun, David Ha

Abstract: We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically disc… ▽ More We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.10037 [pdf, other]

Knowledge Condensation and Reasoning for Knowledge-based VQA

Authors: Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo Wang, Quan Chen, Han Li, **g Liu

Abstract: Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions. However, these retrieved knowledge passages often contain irrelevant or noisy inform… ▽ More Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions. However, these retrieved knowledge passages often contain irrelevant or noisy information, which limits the performance of the model. To address the challenge, we propose two synergistic models: Knowledge Condensation model and Knowledge Reasoning model. We condense the retrieved knowledge passages from two perspectives. First, we leverage the multimodal perception and reasoning ability of the visual-language models to distill concise knowledge concepts from retrieved lengthy passages, ensuring relevance to both the visual content and the question. Second, we leverage the text comprehension ability of the large language models to summarize and condense the passages into the knowledge essence which helps answer the question. These two types of condensed knowledge are then seamlessly integrated into our Knowledge Reasoning model, which judiciously navigates through the amalgamated information to arrive at the conclusive answer. Extensive experiments validate the superiority of the proposed method. Compared to previous methods, our method achieves state-of-the-art performance on knowledge-based VQA datasets (65.1% on OK-VQA and 60.1% on A-OKVQA) without resorting to the knowledge produced by GPT-3 (175B). △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.04981 [pdf, other]

Paving the Way for Pass Disturb Free Vertical NAND Storage via A Dedicated and String-Compatible Pass Gate

Authors: Zijian Zhao, Sola Woo, Khandker Akif Aabrar, Sharadindu Gopal Kirtania, Zhouhang Jiang, Shan Deng, Yi Xiao, Halid Mulaosmanovic, Stefan Duenkel, Dominik Kleimaier, Steven Soss, Sven Beyer, Rajiv Joshi, Scott Meninger, Mohamed Mohamed, Kijoon Kim, Jongho Woo, Suhwan Lim, Kwangsoo Kim, Wanki Kim, Daewon Ha, Vijaykrishnan Narayanan, Suman Datta, Shimeng Yu, Kai Ni

Abstract: In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-… ▽ More In this work, we propose a dual-port cell design to address the pass disturb in vertical NAND storage, which can pass signals through a dedicated and string-compatible pass gate. We demonstrate that: i) the pass disturb-free feature originates from weakening of the depolarization field by the pass bias at the high-${V}_{TH}$ (HVT) state and the screening of the applied field by channel at the low-${V}_{TH}$ (LVT) state; ii) combined simulations and experimental demonstrations of dual-port design verify the disturb-free operation in a NAND string, overcoming a key challenge in single-port designs; iii) the proposed design can be incorporated in a highly scaled vertical NAND FeFET string and the pass gate can be incorporated into the existing 3D NAND with the negligible overhead of the pass gate interconnection through a global bottom pass gate contact in the substrate. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 29 pages, 7 figures

arXiv:2402.12175 [pdf, other]

Learning Discretized Bayesian Networks with GOMEA

Authors: Damy M. F. Ha, Tanja Alderliesten, Peter A. N. Bosman

Abstract: Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unl… ▽ More Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: The code is available at: https://github.com/damyha/dbn_gomea

arXiv:2402.10280 [pdf, other]

SusFL: Energy-Aware Federated Learning-based Monitoring for Sustainable Smart Farms

Authors: Dian Chen, Paul Yang, Ing-Ray Chen, Dong Sam Ha, **-Hee Cho

Abstract: We propose a novel energy-aware federated learning (FL)-based system, namely SusFL, for sustainable smart farming to address the challenge of inconsistent health monitoring due to fluctuating energy levels of solar sensors. This system equips animals, such as cattle, with solar sensors with computational capabilities, including Raspberry Pis, to train a local deep-learning model on health data. Th… ▽ More We propose a novel energy-aware federated learning (FL)-based system, namely SusFL, for sustainable smart farming to address the challenge of inconsistent health monitoring due to fluctuating energy levels of solar sensors. This system equips animals, such as cattle, with solar sensors with computational capabilities, including Raspberry Pis, to train a local deep-learning model on health data. These sensors periodically update Long Range (LoRa) gateways, forming a wireless sensor network (WSN) to detect diseases like mastitis. Our proposed SusFL system incorporates mechanism design, a game theory concept, for intelligent client selection to optimize monitoring quality while minimizing energy use. This strategy ensures the system's sustainability and resilience against adversarial attacks, including data poisoning and privacy threats, that could disrupt FL operations. Through extensive comparative analysis using real-time datasets, we demonstrate that our FL-based monitoring system significantly outperforms existing methods in prediction accuracy, operational efficiency, system reliability (i.e., mean time between failures or MTBF), and social welfare maximization by the mechanism designer. Our findings validate the superiority of our system for effective and sustainable animal health monitoring in smart farms. The experimental results show that SusFL significantly improves system performance, including a $10\%$ reduction in energy consumption, a $15\%$ increase in social welfare, and a $34\%$ rise in Mean Time Between Failures (MTBF), alongside a marginal increase in the global model's prediction accuracy. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.08768 [pdf, other]

Adversarially Robust Feature Learning for Breast Cancer Diagnosis

Authors: Degan Hao, Dooman Arefan, Margarita Zuley, Wendie Berg, Shandong Wu

Abstract: Adversarial data can lead to malfunction of deep learning applications. It is essential to develop deep learning models that are robust to adversarial data while accurate on standard, clean data. In this study, we proposed a novel adversarially robust feature learning (ARFL) method for a real-world application of breast cancer diagnosis. ARFL facilitates adversarial training using both standard da… ▽ More Adversarial data can lead to malfunction of deep learning applications. It is essential to develop deep learning models that are robust to adversarial data while accurate on standard, clean data. In this study, we proposed a novel adversarially robust feature learning (ARFL) method for a real-world application of breast cancer diagnosis. ARFL facilitates adversarial training using both standard data and adversarial data, where a feature correlation measure is incorporated as an objective function to encourage learning of robust features and restrain spurious features. To show the effects of ARFL in breast cancer diagnosis, we built and evaluated diagnosis models using two independent clinically collected breast imaging datasets, comprising a total of 9,548 mammogram images. We performed extensive experiments showing that our method outperformed several state-of-the-art methods and that our method can enhance safer breast cancer diagnosis against adversarial attacks in clinical settings. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.01287 [pdf, other]

Spiking CenterNet: A Distillation-boosted Spiking Neural Network for Object Detection

Authors: Lennard Bodden, Franziska Schwaiger, Duc Bach Ha, Lars Kreuzberg, Sven Behnke

Abstract: In the era of AI at the edge, self-driving cars, and climate change, the need for energy-efficient, small, embedded AI is growing. Spiking Neural Networks (SNNs) are a promising approach to address this challenge, with their event-driven information flow and sparse activations. We propose Spiking CenterNet for object detection on event data. It combines an SNN CenterNet adaptation with an efficien… ▽ More In the era of AI at the edge, self-driving cars, and climate change, the need for energy-efficient, small, embedded AI is growing. Spiking Neural Networks (SNNs) are a promising approach to address this challenge, with their event-driven information flow and sparse activations. We propose Spiking CenterNet for object detection on event data. It combines an SNN CenterNet adaptation with an efficient M2U-Net-based decoder. Our model significantly outperforms comparable previous work on Prophesee's challenging GEN1 Automotive Detection Dataset while using less than half the energy. Distilling the knowledge of a non-spiking teacher into our SNN further increases performance. To the best of our knowledge, our work is the first approach that takes advantage of knowledge distillation in the field of spiking object detection. △ Less

Submitted 6 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 8 pages, 5 figures. Accepted at IJCNN 2024

arXiv:2401.03673 [pdf, other]

Comparing discriminating abilities of evaluation metrics in link prediction

Authors: Xinshan Jiao, Shuyan Wan, Qian Liu, Yilin Bi, Yan-Li Lee, En Xu, Dong Hao, Tao Zhou

Abstract: Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we pr… ▽ More Link prediction aims to predict the potential existence of links between two unconnected nodes within a network based on the known topological characteristics. Evaluation metrics are used to assess the effectiveness of algorithms in link prediction. The discriminating ability of these evaluation metrics is vitally important for accurately evaluating link prediction algorithms. In this study, we propose an artificial network model, based on which one can adjust a single parameter to monotonically and continuously turn the prediction accuracy of the specifically designed link prediction algorithm. Building upon this foundation, we show a framework to depict the effectiveness of evaluating metrics by focusing on their discriminating ability. Specifically, a quantitative comparison in the abilities of correctly discerning varying prediction accuracies was conducted encompassing nine evaluation metrics: Precision, Recall, F1-Measure, Matthews Correlation Coefficient (MCC), Balanced Precision (BP), the Area Under the receiver operating characteristic Curve (AUC), the Area Under the Precision-Recall curve (AUPR), Normalized Discounted Cumulative Gain (NDCG), and the Area Under the magnified ROC (AUC-mROC). The results indicate that the discriminating abilities of the three metrics, AUC, AUPR, and NDCG, are significantly higher than those of other metrics. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.09000 [pdf, ps, other]

ComOM at VLSP 2023: A Dual-Stage Framework with BERTology and Unified Multi-Task Instruction Tuning Model for Vietnamese Comparative Opinion Mining

Authors: Dang Van Thin, Duong Ngoc Hao, Ngan Luu-Thuy Nguyen

Abstract: The ComOM shared task aims to extract comparative opinions from product reviews in Vietnamese language. There are two sub-tasks, including (1) Comparative Sentence Identification (CSI) and (2) Comparative Element Extraction (CEE). The first task is to identify whether the input is a comparative review, and the purpose of the second task is to extract the quintuplets mentioned in the comparative re… ▽ More The ComOM shared task aims to extract comparative opinions from product reviews in Vietnamese language. There are two sub-tasks, including (1) Comparative Sentence Identification (CSI) and (2) Comparative Element Extraction (CEE). The first task is to identify whether the input is a comparative review, and the purpose of the second task is to extract the quintuplets mentioned in the comparative review. To address this task, our team proposes a two-stage system based on fine-tuning a BERTology model for the CSI task and unified multi-task instruction tuning for the CEE task. Besides, we apply the simple data augmentation technique to increase the size of the dataset for training our model in the second stage. Experimental results show that our approach outperforms the other competitors and has achieved the top score on the official private test. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted manuscript at VLSP 2023

arXiv:2311.13413 [pdf, other]

Revisiting Machine Learning based Test Case Prioritization for Continuous Integration

Authors: Yifan Zhao, Dan Hao, Lu Zhang

Abstract: To alleviate the cost of regression testing in continuous integration (CI), a large number of machine learning-based (ML-based) test case prioritization techniques have been proposed. However, it is yet unknown how they perform under the same experimental setup, because they are evaluated on different datasets with different metrics. To bridge this gap, we conduct the first comprehensive study on… ▽ More To alleviate the cost of regression testing in continuous integration (CI), a large number of machine learning-based (ML-based) test case prioritization techniques have been proposed. However, it is yet unknown how they perform under the same experimental setup, because they are evaluated on different datasets with different metrics. To bridge this gap, we conduct the first comprehensive study on these ML-based techniques in this paper. We investigate the performance of 11 representative ML-based prioritization techniques for CI on 11 open-source subjects and obtain a series of findings. For example, the performance of the techniques changes across CI cycles, mainly resulting from the changing amount of training data, instead of code evolution and test removal/addition. Based on the findings, we give some actionable suggestions on enhancing the effectiveness of ML-based techniques, e.g., pretraining a prioritization technique with cross-subject data to get it thoroughly trained and then finetuning it with within-subject data dramatically improves its performance. In particular, the pretrained MART achieves state-of-the-art performance, producing the optimal sequence on 80% subjects, while the existing best technique, the original MART, only produces the optimal sequence on 50% subjects. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: This paper has been accepted by ICSME 2023

arXiv:2310.11654 [pdf, other]

Subject-specific Deep Neural Networks for Count Data with High-cardinality Categorical Features

Authors: Hangbin Lee, Il Do Ha, Changha Hwang, Youngjo Lee

Abstract: There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capt… ▽ More There is a growing interest in subject-specific predictions using deep neural networks (DNNs) because real-world data often exhibit correlations, which has been typically overlooked in traditional DNN frameworks. In this paper, we propose a novel hierarchical likelihood learning framework for introducing gamma random effects into the Poisson DNN, so as to improve the prediction performance by capturing both nonlinear effects of input variables and subject-specific cluster effects. The proposed method simultaneously yields maximum likelihood estimators for fixed parameters and best unbiased predictors for random effects by optimizing a single objective function. This approach enables a fast end-to-end algorithm for handling clustered count data, which often involve high-cardinality categorical features. Furthermore, state-of-the-art network architectures can be easily implemented into the proposed h-likelihood framework. As an example, we introduce multi-head attention layer and a sparsemax function, which allows feature selection in high-dimensional settings. To enhance practical performance and learning efficiency, we present an adjustment procedure for prediction of random parameters and a method-of-moments estimator for pretraining of variance component. Various experiential studies and real data analyses confirm the advantages of our proposed methods. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.09705 [pdf, other]

SGA: A Graph Augmentation Method for Signed Graph Neural Networks

Authors: Zeyu Zhang, Shuyan Wan, Sijie Wang, Xianda Zheng, Xinrui Zhang, Kaiqi Zhao, Jiamou Liu, Dong Hao

Abstract: Signed Graph Neural Networks (SGNNs) are vital for analyzing complex patterns in real-world signed graphs containing positive and negative links. However, three key challenges hinder current SGNN-based signed graph representation learning: sparsity in signed graphs leaves latent structures undiscovered, unbalanced triangles pose representation difficulties for SGNN models, and real-world signed gr… ▽ More Signed Graph Neural Networks (SGNNs) are vital for analyzing complex patterns in real-world signed graphs containing positive and negative links. However, three key challenges hinder current SGNN-based signed graph representation learning: sparsity in signed graphs leaves latent structures undiscovered, unbalanced triangles pose representation difficulties for SGNN models, and real-world signed graph datasets often lack supplementary information like node labels and features. These constraints limit the potential of SGNN-based representation learning. We address these issues with data augmentation techniques. Despite many graph data augmentation methods existing for unsigned graphs, none are tailored for signed graphs. Our paper introduces the novel Signed Graph Augmentation framework (SGA), comprising three main components. First, we employ the SGNN model to encode the signed graph, extracting latent structural information for candidate augmentation structures. Second, we evaluate these candidate samples (edges) and select the most beneficial ones for modifying the original training set. Third, we propose a novel augmentation perspective that assigns varying training difficulty to training samples, enabling the design of a new training strategy. Extensive experiments on six real-world datasets (Bitcoin-alpha, Bitcoin-otc, Epinions, Slashdot, Wiki-elec, and Wiki-RfA) demonstrate that SGA significantly improves performance across multiple benchmarks. Our method outperforms baselines by up to 22.2% in AUC for SGCN on Wiki-RfA, 33.3% in F1-binary, 48.8% in F1-micro, and 36.3% in F1-macro for GAT on Bitcoin-alpha in link sign prediction. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2309.12025 [pdf, other]

Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint

Authors: Dung T. K. Ha, Canh V. Pham, Tan D. Tran, Huan X. Hoang

Abstract: The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of th… ▽ More The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-ε$ in $O(nk)$ queries, where $ε$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $Ω(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 12 pages

Report number: KSE-ID38

arXiv:2308.08288 [pdf, other]

Improving Audio-Visual Segmentation with Bidirectional Generation

Authors: Dawei Hao, Yuxin Mao, Bowen He, Xiaodong Han, Yuchao Dai, Yiran Zhong

Abstract: The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio… ▽ More The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. This framework establishes robust correlations between an object's visual characteristics and its associated sound, thereby enhancing the performance of AVS. To achieve this, we employ a visual-to-audio projection component that reconstructs audio features from object segmentation masks and minimizes reconstruction errors. Moreover, recognizing that many sounds are linked to object movements, we introduce an implicit volumetric motion estimation module to handle temporal dynamics that may be challenging to capture using conventional optical flow methods. To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark. As a result, we establish a new state-of-the-art performance level in the AVS benchmark, particularly excelling in the challenging MS3 subset which involves segmenting multiple sound sources. To facilitate reproducibility, we plan to release both the source code and the pre-trained model. △ Less

Submitted 19 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: AAAI Camera Ready. Dawei Hao and Yuxin Mao contribute equality to this paper. Yiran Zhong is the corresponding author. The code will be released at https://github.com/OpenNLPLab/AVS-bidirectional

arXiv:2307.06581 [pdf, other]

Deep Neural Networks for Semiparametric Frailty Models via H-likelihood

Authors: Hangbin Lee, IL DO HA, Youngjo Lee

Abstract: For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood… ▽ More For prediction of clustered time-to-event data, we propose a new deep neural network based gamma frailty model (DNN-FM). An advantage of the proposed model is that the joint maximization of the new h-likelihood provides maximum likelihood estimators for fixed parameters and best unbiased predictors for random frailties. Thus, the proposed DNN-FM is trained by using a negative profiled h-likelihood as a loss function, constructed by profiling out the non-parametric baseline hazard. Experimental studies show that the proposed method enhances the prediction performance of the existing methods. A real data analysis shows that the inclusion of subject-specific frailties helps to improve prediction of the DNN based Cox model (DNN-Cox). △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.04770 [pdf, other]

Predicting Outcomes in Long COVID Patients with Spatiotemporal Attention

Authors: Degan Hao, Mohammadreza Negahdar

Abstract: Long COVID is a general term of post-acute sequelae of COVID-19. Patients with long COVID can endure long-lasting symptoms including fatigue, headache, dyspnea and anosmia, etc. Identifying the cohorts with severe long-term complications in COVID-19 could benefit the treatment planning and resource arrangement. However, due to the heterogeneous phenotype presented in long COVID patients, it is dif… ▽ More Long COVID is a general term of post-acute sequelae of COVID-19. Patients with long COVID can endure long-lasting symptoms including fatigue, headache, dyspnea and anosmia, etc. Identifying the cohorts with severe long-term complications in COVID-19 could benefit the treatment planning and resource arrangement. However, due to the heterogeneous phenotype presented in long COVID patients, it is difficult to predict their outcomes from their longitudinal data. In this study, we proposed a spatiotemporal attention mechanism to weigh feature importance jointly from the temporal dimension and feature space. Considering that medical examinations can have interchangeable orders in adjacent time points, we restricted the learning of short-term dependency with a Local-LSTM and the learning of long-term dependency with the joint spatiotemporal attention. We also compared the proposed method with several state-of-the-art methods and a method in clinical practice. The methods are evaluated on a hard-to-acquire clinical dataset of patients with long COVID. Experimental results show the Local-LSTM with joint spatiotemporal attention outperformed related methods in outcome prediction. The proposed method provides a clinical tool for the severity assessment of long COVID. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2305.10292 [pdf, other]

Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint

Authors: Canh V. Pham, Tan D. Tran, Dung T. K. Ha, My T. Thai

Abstract: This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and $\mathsf{RLA}$. $\mathsf{DLA}$ is a deterministic algorithm that provides an approximation factor of $6+ε$ while $\mathsf{RLA}$ is a randomized algorithm with a… ▽ More This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and $\mathsf{RLA}$. $\mathsf{DLA}$ is a deterministic algorithm that provides an approximation factor of $6+ε$ while $\mathsf{RLA}$ is a randomized algorithm with an approximation factor of $4+ε$. Both run in $O(n \log(1/ε)/ε)$ query complexity. The key idea to obtain a constant approximation ratio with linear query lies in: (1) dividing the ground set into two appropriate subsets to find the near-optimal solution over these subsets with linear queries, and (2) combining a threshold greedy with properties of two disjoint sets or a random selection process to improve solution quality. In addition to the theoretical analysis, we have evaluated our proposed solutions with three applications: Revenue Maximization, Image Summarization, and Maximum Weighted Cut, showing that our algorithms not only return comparative results to state-of-the-art algorithms but also require significantly fewer queries. △ Less

Submitted 10 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.14908 [pdf, other]

Compiler Auto-tuning through Multiple Phase Learning

Authors: Mingxuan Zhu, Dan Hao, Junjie Chen

Abstract: Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In… ▽ More Widely used compilers like GCC and LLVM usually have hundreds of optimizations controlled by optimization flags, which are enabled or disabled during compilation to improve runtime performance (e.g., small execution time) of the compiler program. Due to the large number of optimization flags and their combination, it is difficult for compiler users to manually tune compiler optimization flags. In the literature, a number of auto-tuning techniques have been proposed, which tune optimization flags for a compiled program by comparing its actual runtime performance with different optimization flag combination. Due to the huge search space and heavy actual runtime cost, these techniques suffer from the widely-recognized efficiency problem. To reduce the heavy runtime cost, in this paper we propose a lightweight learning approach which uses a small number of actual runtime performance data to predict the runtime performance of a compiled program with various optimization flag combination. Furthermore, to reduce the search space, we design a novel particle swarm algorithm which tunes compiler optimization flags with the prediction model. To evaluate the performance of the proposed approach CompTuner, we conduct an extensive experimental study on two popular C compilers GCC and LLVM with two widely used benchmarks cBench and PolyBench. The experimental results show that CompTuner significantly outperforms the five compared techniques, including the state-of-art technique BOCA. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2304.10406 [pdf, other]

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

Authors: Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

Abstract: We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address thi… ▽ More We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points. However, simply employing NeRF cannot achieve satisfactory results, as it only focuses on learning individual pixels while ignoring local information, especially at low texture areas, resulting in poor geometry. To this end, we have taken steps to address this issue by introducing a structural regularization method to preserve local structural details. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR-NeRF surpasses the model-based algorithms significantly. △ Less

Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: This paper introduces a new task of novel LiDAR view synthesis, and proposes a differentiable framework called LiDAR-NeRF with a structural regularization, as well as an object-centric multi-view LiDAR dataset called NeRF-MVL

arXiv:2302.06185 [pdf, other]

PUPS: Point Cloud Unified Panoptic Segmentation

Authors: Shihao Su, Jianyun Xu, Huanyu Wang, Zhenwei Miao, Xin Zhan, Dayang Hao, Xi Li

Abstract: Point cloud panoptic segmentation is a challenging task that seeks a holistic solution for both semantic and instance segmentation to predict grou**s of coherent points. Previous approaches treat semantic and instance segmentation as surrogate tasks, and they either use clustering methods or bounding boxes to gather instance grou**s with costly computation and hand-crafted designs in the insta… ▽ More Point cloud panoptic segmentation is a challenging task that seeks a holistic solution for both semantic and instance segmentation to predict grou**s of coherent points. Previous approaches treat semantic and instance segmentation as surrogate tasks, and they either use clustering methods or bounding boxes to gather instance grou**s with costly computation and hand-crafted designs in the instance segmentation task. In this paper, we propose a simple but effective point cloud unified panoptic segmentation (PUPS) framework, which use a set of point-level classifiers to directly predict semantic and instance grou**s in an end-to-end manner. To realize PUPS, we introduce bipartite matching to our training pipeline so that our classifiers are able to exclusively predict grou**s of instances, getting rid of hand-crafted designs, e.g. anchors and Non-Maximum Suppression (NMS). In order to achieve better grou** results, we utilize a transformer decoder to iteratively refine the point classifiers and develop a context-aware CutMix augmentation to overcome the class imbalance problem. As a result, PUPS achieves 1st place on the leader board of SemanticKITTI panoptic segmentation task and state-of-the-art results on nuScenes. △ Less

Submitted 27 February, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: accepted by AAAI2023

arXiv:2209.03758 [pdf, other]

Improved Sensor-Based Animal Behavior Classification Performance through Conditional Generative Adversarial Network

Authors: Zhuqing Zhao, Dong Ha, Abhishek Damle, Barbara Roqueto Dos, Robin White, Sook Ha

Abstract: Many activity classifications segments data into fixed window size for feature extraction and classification. However, animal behaviors have various durations that do not match the predetermined window size. The dense labeling and dense prediction methods address this limitation by predicting labels for every point. Thus, by tracing the starting and ending points, we could know the time location a… ▽ More Many activity classifications segments data into fixed window size for feature extraction and classification. However, animal behaviors have various durations that do not match the predetermined window size. The dense labeling and dense prediction methods address this limitation by predicting labels for every point. Thus, by tracing the starting and ending points, we could know the time location and duration of all occurring activities. Still, the dense prediction could be noisy with misalignments problems. We modified the U-Net and Conditional Generative Adversarial Network (cGAN) with customized loss functions as a training strategy to reduce fragmentation and other misalignments. In cGAN, the discriminator and generator trained against each other like an adversarial competition. The generator produces dense predictions. The discriminator works as a high-level consistency check, in our case, pushing the generator to predict activities with reasonable duration. The model trained with cGAN shows better or comparable performance in the cow, pig, and UCI HAPT dataset. The cGAN-trained modified U-Net improved from 92.17% to 94.66% for the UCI HAPT dataset and from 90.85% to 93.18% for pig data compared to previous dense prediction work. △ Less

Submitted 6 September, 2022; originally announced September 2022.

arXiv:2208.14591 [pdf, other]

Combinatorial Procurement Auction in Social Networks

Authors: Yuhang Guo, Dong Hao, Bin Li

Abstract: This paper studies one emerging procurement auction scenario where the market is constructed over the social networks. In a social network composed of many agents, smartphones or computers, one requester releases her requirement for goods or tasks to suppliers, then suppliers who have entered the market are also encouraged to invite some other suppliers to join and all the suppliers in the network… ▽ More This paper studies one emerging procurement auction scenario where the market is constructed over the social networks. In a social network composed of many agents, smartphones or computers, one requester releases her requirement for goods or tasks to suppliers, then suppliers who have entered the market are also encouraged to invite some other suppliers to join and all the suppliers in the network could compete for the business. The key problem for this networked auction is about how to incentivize each node who have entered the sell not only to truthfully use her full ability, but also to forward the task to her neighbours. Auctions conducting over social networks have attracted considerable interests in recent years. However, most of the existing works focus on classic forward auctions. Moreover, there is no existing valid networked auction considering multiple goods/tasks. This work is the first to explore procurement auction for both homogeneous and heterogeneous goods or tasks in social networks. From both theoretical proof and experimental simulation, we proved that the proposed mechanisms are proved to be individual-rational and incentive-compatible, also both the cost of the system and the requester could get decreased. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2208.03374 [pdf, other]

Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

Authors: Aleksandar Stanić, Yu** Tang, David Ha, Jürgen Schmidhuber

Abstract: Reinforcement learning agents must generalize beyond their training experience. Prior work has focused mostly on identical training and evaluation environments. Starting from the recently introduced Crafter benchmark, a 2D open world survival game, we introduce a new set of environments suitable for evaluating some agent's ability to generalize on previously unseen (numbers of) objects and to adap… ▽ More Reinforcement learning agents must generalize beyond their training experience. Prior work has focused mostly on identical training and evaluation environments. Starting from the recently introduced Crafter benchmark, a 2D open world survival game, we introduce a new set of environments suitable for evaluating some agent's ability to generalize on previously unseen (numbers of) objects and to adapt quickly (meta-learning). In Crafter, the agents are evaluated by the number of unlocked achievements (such as collecting resources) when trained for 1M steps. We show that current agents struggle to generalize, and introduce novel object-centric agents that improve over strong baselines. We also provide critical insights of general interest for future work on Crafter through several experiments. We show that careful hyper-parameter tuning improves the PPO baseline agent by a large margin and that even feedforward agents can unlock almost all achievements by relying on the inventory display. We achieve new state-of-the-art performance on the original Crafter environment. Additionally, when trained beyond 1M steps, our tuned agents can unlock almost all achievements. We show that the recurrent PPO agents improve over feedforward ones, even with the inventory information removed. We introduce CrafterOOD, a set of 15 new environments that evaluate OOD generalization. On CrafterOOD, we show that the current agents fail to generalize, whereas our novel object-centric agents achieve state-of-the-art OOD generalization while also being interpretable. Our code is public. △ Less

Submitted 5 August, 2022; originally announced August 2022.

ACM Class: I.2.6

arXiv:2205.14951 [pdf, other]

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

Authors: Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, Tingting Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang

Abstract: There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary informatio… ▽ More There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary information to the other. However, we observe that current datasets are captured from expensive vehicles that are explicitly designed for data collection purposes, and cannot truly reflect the realistic data distribution due to various reasons. To this end, we collect a series of real-world cases with noisy data distribution, and systematically formulate a robustness benchmark toolkit, that simulates these cases on any clean autonomous driving datasets. We showcase the effectiveness of our toolkit by establishing the robustness benchmark on two widely-adopted autonomous driving datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically benchmark the state-of-the-art fusion methods for the first time. We observe that: i) most fusion methods, when solely developed on these data, tend to fail inevitably when there is a disruption to the LiDAR input; ii) the improvement of the camera input is significantly inferior to the LiDAR one. We further propose an efficient robust training strategy to improve the robustness of the current fusion method. The benchmark and code are available at https://github.com/kcyu2014/lidar-camera-robust-benchmark △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: Technical report. The first three authors contribute equally

arXiv:2205.10239 [pdf, other]

doi 10.1109/TSE.2021.3137929

AGA: An Accelerated Greedy Additional Algorithm for Test Case Prioritization

Authors: Feng Li, Jianyi Zhou, Yinzhu Li, Dan Hao, Lu Zhang

Abstract: In recent years, many test case prioritization (TCP) techniques have been proposed to speed up the process of fault detection. However, little work has taken the efficiency problem of these techniques into account. In this paper, we target the Greedy Additional (GA) algorithm, which has been widely recognized to be effective but less efficient, and try to improve its efficiency while preserving ef… ▽ More In recent years, many test case prioritization (TCP) techniques have been proposed to speed up the process of fault detection. However, little work has taken the efficiency problem of these techniques into account. In this paper, we target the Greedy Additional (GA) algorithm, which has been widely recognized to be effective but less efficient, and try to improve its efficiency while preserving effectiveness. In our Accelerated GA (AGA) algorithm, we use some extra data structures to reduce redundant data accesses in the GA algorithm and thus the time complexity is reduced from $\mathcal{O}(m^2n)$ to $\mathcal{O}(kmn)$ when $n > m$, where $m$ is the number of test cases, $n$ is the number of program elements, and $k$ is the iteration number. Moreover, we observe the impact of iteration numbers on prioritization efficiency on our dataset and propose to use a specific iteration number in the AGA algorithm to further improve the efficiency. We conducted experiments on 55 open-source subjects. In particular, we implemented each TCP algorithm with two kinds of widely-used input formats, adjacency matrix and adjacency list. Since a TCP algorithm with adjacency matrix is less efficient than the algorithm with adjacency list, the result analysis is mainly conducted based on TCP algorithms with adjacency list. The results show that AGA achieves 5.95X speedup ratio over GA on average, while it achieves the same average effectiveness as GA in terms of Average Percentage of Fault Detected (APFD). Moreover, we conducted an industrial case study on 22 subjects, collected from Baidu, and find that the average speedup ratio of AGA over GA is 44.27X, which indicates the practical usage of AGA in real-world scenarios. △ Less

Submitted 20 May, 2022; originally announced May 2022.

Comments: IEEE Transactions on Software Engineering, 2021

arXiv:2204.08472 [pdf, other]

Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport

Authors: Yingtao Tian, Marco Cuturi, David Ha

Abstract: Recent advances in deep learning, such as powerful generative models and joint text-image embeddings, have provided the computational creativity community with new tools, opening new perspectives for artistic pursuits. Text-to-image synthesis approaches that operate by generating images from text cues provide a case in point. These images are generated with a latent vector that is progressively re… ▽ More Recent advances in deep learning, such as powerful generative models and joint text-image embeddings, have provided the computational creativity community with new tools, opening new perspectives for artistic pursuits. Text-to-image synthesis approaches that operate by generating images from text cues provide a case in point. These images are generated with a latent vector that is progressively refined to agree with text cues. To do so, patches are sampled within the generated image, and compared with the text prompts in the common text-image embedding space; The latent vector is then updated, using gradient descent, to reduce the mean (average) distance between these patches and text cues. While this approach provides artists with ample freedom to customize the overall appearance of images, through their choice in generative models, the reliance on a simple criterion (mean of distances) often causes mode collapse: The entire image is drawn to the average of all text cues, thereby losing their diversity. To address this issue, we propose using matching techniques found in the optimal transport (OT) literature, resulting in images that are able to reflect faithfully a wide diversity of prompts. We provide numerous illustrations showing that OT avoids some of the pitfalls arising from estimating vectors with mean distances, and demonstrate the capacity of our proposed method to perform better in experiments, qualitatively and quantitatively. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: Accepted at ICCC 2022

arXiv:2204.06481 [pdf, other]

doi 10.1145/3512290.3528762

Evolving Modular Soft Robots without Explicit Inter-Module Communication using Local Self-Attention

Authors: Federico Pigozzi, Yu** Tang, Eric Medvet, David Ha

Abstract: Modularity in robotics holds great potential. In principle, modular robots can be disassembled and reassembled in different robots, and possibly perform new tasks. Nevertheless, actually exploiting modularity is yet an unsolved problem: controllers usually rely on inter-module communication, a practical requirement that makes modules not perfectly interchangeable and thus limits their flexibility.… ▽ More Modularity in robotics holds great potential. In principle, modular robots can be disassembled and reassembled in different robots, and possibly perform new tasks. Nevertheless, actually exploiting modularity is yet an unsolved problem: controllers usually rely on inter-module communication, a practical requirement that makes modules not perfectly interchangeable and thus limits their flexibility. Here, we focus on Voxel-based Soft Robots (VSRs), aggregations of mechanically identical elastic blocks. We use the same neural controller inside each voxel, but without any inter-voxel communication, hence enabling ideal conditions for modularity: modules are all equal and interchangeable. We optimize the parameters of the neural controller-shared among the voxels-by evolutionary computation. Crucially, we use a local self-attention mechanism inside the controller to overcome the absence of inter-module communication channels, thus enabling our robots to truly be driven by the collective intelligence of their modules. We show experimentally that the evolved robots are effective in the task of locomotion: thanks to self-attention, instances of the same controller embodied in the same robot can focus on different inputs. We also find that the evolved controllers generalize to unseen morphologies, after a short fine-tuning, suggesting that an inductive bias related to the task arises from true modularity. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: Accepted at the Genetic and Evolutionary Computation Conference 2022 (GECCO'22) complex systems track as a full paper

arXiv:2203.07796 [pdf, other]

Multi-Unit Diffusion Auctions with Intermediaries

Authors: Bin Li, Dong Hao, Dengji Zhao

Abstract: This paper studies multi-unit auctions powered by intermediaries, where each intermediary owns a private set of unit-demand buyers and all intermediaries are networked with each other. Our goal is to incentivize the intermediaries to diffuse the auction information to individuals they can reach, including their private buyers and neighboring intermediaries, so that more potential buyers are able t… ▽ More This paper studies multi-unit auctions powered by intermediaries, where each intermediary owns a private set of unit-demand buyers and all intermediaries are networked with each other. Our goal is to incentivize the intermediaries to diffuse the auction information to individuals they can reach, including their private buyers and neighboring intermediaries, so that more potential buyers are able to participate in the auction. To this end, we build a diffusion-based auction framework which incorporates the strategic interaction of intermediaries. It is showed that the classic Vickrey-Clarke-Groves (VCG) mechanism within the framework can achieve the maximum social welfare, but it may decrease the seller's revenue or even lead to a deficit. To overcome the revenue issue, we propose a novel auction, called critical neighborhood auction, which not only maximizes the social welfare, but also improves the seller's revenue comparing to the VCG mechanism with/without intermediaries. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2202.05008 [pdf, other]

doi 10.1145/3520304.3528770

EvoJAX: Hardware-Accelerated Neuroevolution

Authors: Yu** Tang, Yingtao Tian, David Ha

Abstract: Evolutionary computation has been shown to be a highly effective method for training neural networks, particularly when employed at scale on CPU clusters. Recent work have also showcased their effectiveness on hardware accelerators, such as GPUs, but so far such demonstrations are tailored for very specific tasks, limiting applicability to other domains. We present EvoJAX, a scalable, general purp… ▽ More Evolutionary computation has been shown to be a highly effective method for training neural networks, particularly when employed at scale on CPU clusters. Recent work have also showcased their effectiveness on hardware accelerators, such as GPUs, but so far such demonstrations are tailored for very specific tasks, limiting applicability to other domains. We present EvoJAX, a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Building on top of the JAX library, our toolkit enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs. EvoJAX achieves very high performance by implementing the evolution algorithm, neural network and task all in NumPy, which is compiled just-in-time to run on accelerators. We provide extensible examples of EvoJAX for a wide range of tasks, including supervised learning, reinforcement learning and generative art. Since EvoJAX can find solutions to most of these tasks within minutes on a single accelerator, compared to hours or days when using CPUs, our toolkit can significantly shorten the iteration cycle of evolutionary computation experiments. EvoJAX is available at https://github.com/google/evojax △ Less

Submitted 5 April, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: GECCO 2022. Project website at https://github.com/google/evojax

arXiv:2201.06868 [pdf, other]

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

Authors: Michael Gref, Nike Matthiesen, Sreenivasa Hikkal Venugopala, Shalaka Satheesh, Aswinkumar Vijayananth, Duc Bach Ha, Sven Behnke, Joachim Köhler

Abstract: For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remember… ▽ More For research in audiovisual interview archives often it is not only of interest what is said but also how. Sentiment analysis and emotion recognition can help capture, categorize and make these different facets searchable. In particular, for oral history archives, such indexing technologies can be of great interest. These technologies can help understand the role of emotions in historical remembering. However, humans often perceive sentiments and emotions ambiguously and subjectively. Moreover, oral history interviews have multi-layered levels of complex, sometimes contradictory, sometimes very subtle facets of emotions. Therefore, the question arises of the chance machines and humans have capturing and assigning these into predefined categories. This paper investigates the ambiguity in human perception of emotions and sentiment in German oral history interviews and the impact on machine learning systems. Our experiments reveal substantial differences in human perception for different emotions. Furthermore, we report from ongoing machine learning experiments with different modalities. We show that the human perceptual ambiguity and other challenges, such as class imbalance and lack of training data, currently limit the opportunities of these technologies for oral history archives. Nonetheless, our work uncovers promising observations and possibilities for further research. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: Submitted to LREC 2022

arXiv:2112.02884 [pdf, other]

doi 10.1109/TNET.2022.3223367

Social Sourcing: Incorporating Social Networks Into Crowdsourcing Contest Design

Authors: Qi Shi, Dong Hao

Abstract: In a crowdsourcing contest, a principal holding a task posts it to a crowd. People in the crowd then compete with each other to win the rewards. Although in real life, a crowd is usually networked and people influence each other via social ties, existing crowdsourcing contest theories do not aim to answer how interpersonal relationships influence people's incentives and behaviors and thereby affec… ▽ More In a crowdsourcing contest, a principal holding a task posts it to a crowd. People in the crowd then compete with each other to win the rewards. Although in real life, a crowd is usually networked and people influence each other via social ties, existing crowdsourcing contest theories do not aim to answer how interpersonal relationships influence people's incentives and behaviors and thereby affect the crowdsourcing performance. In this work, we novelly take people's social ties as a key factor in the modeling and designing of agents' incentives in crowdsourcing contests. We establish two contest mechanisms by which the principal can impel the agents to invite their neighbors to contribute to the task. The first mechanism has a symmetric Bayesian Nash equilibrium, and it is very simple for agents to play and easy for the principal to predict the contest performance. The second mechanism has an asymmetric Bayesian Nash equilibrium, and agents' behaviors in equilibrium show a vast diversity which is strongly related to their social relations. The Bayesian Nash equilibrium analysis of these new mechanisms reveals that, besides agents' intrinsic abilities, the social relations among them also play a central role in decision-making. Moreover, we design an effective algorithm to automatically compute the Bayesian Nash equilibrium of the invitation crowdsourcing contest and further adapt it to a large graph dataset. Both theoretical and empirical results show that the new invitation crowdsourcing contests can substantially enlarge the number of participants, whereby the principal can obtain significantly better solutions without a large advertisement expenditure. △ Less

Submitted 22 November, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: IEEE/ACM Transactions on Networking

arXiv:2112.02271 [pdf, other]

Cooperation, Retaliation and Forgiveness in Revision Games

Authors: Dong Hao, Qi Shi, **yan Su, Bo An

Abstract: Revision game is a very new model formulating the real-time situation where players dynamically prepare and revise their actions in advance before a deadline when payoffs are realized. It is at the cutting edge of dynamic game theory and can be applied in many real-world scenarios, such as eBay auction, stock market, election, online games, crowdsourcing, etc. In this work, we novelly identify a c… ▽ More Revision game is a very new model formulating the real-time situation where players dynamically prepare and revise their actions in advance before a deadline when payoffs are realized. It is at the cutting edge of dynamic game theory and can be applied in many real-world scenarios, such as eBay auction, stock market, election, online games, crowdsourcing, etc. In this work, we novelly identify a class of strategies for revision games which are called Limited Retaliation strategies. An limited retaliation strategy stipulates that, (1) players first follow a recommended cooperative plan; (2) if anyone deviates from the plan, the limited retaliation player retaliates by using the defection action for a limited duration; (3) after the retaliation, the limited retaliation player returns to the cooperative plan. A limited retaliation strategy has three key features. It is cooperative, sustaining a high level of social welfare. It is vengeful, deterring the opponent from betrayal by threatening with a future retaliation. It is yet forgiving, since it resumes cooperation after a proper retaliation. The cooperativeness and vengefulness make it constitute cooperative subgame perfect equilibrium, while the forgiveness makes it tolerate occasional mistakes. limited retaliation strategies show significant advantages over Grim Trigger, which is currently the only known strategy for revision games. Besides its contribution as a new robust and welfare-optimizing equilibrium strategy, our results about limited retaliation strategy can also be used to explain how easy cooperation can happen, and why forgiveness emerges in real-world multi-agent interactions. In addition, limited retaliation strategies are simple to derive and computationally efficient, making it easy for algorithm design and implementation in many multi-agent systems. △ Less

Submitted 12 October, 2022; v1 submitted 4 December, 2021; originally announced December 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2111.14377 [pdf, other]

Collective Intelligence for Deep Learning: A Survey of Recent Developments

Authors: David Ha, Yu** Tang

Abstract: In the past decade, we have witnessed the rise of deep learning to dominate the field of artificial intelligence. Advances in artificial neural networks alongside corresponding advances in hardware accelerators with large memory capacity, together with the availability of large datasets enabled practitioners to train and deploy sophisticated neural network models that achieve state-of-the-art perf… ▽ More In the past decade, we have witnessed the rise of deep learning to dominate the field of artificial intelligence. Advances in artificial neural networks alongside corresponding advances in hardware accelerators with large memory capacity, together with the availability of large datasets enabled practitioners to train and deploy sophisticated neural network models that achieve state-of-the-art performance on tasks across several fields spanning computer vision, natural language processing, and reinforcement learning. However, as these neural networks become bigger, more complex, and more widely used, fundamental problems with current deep learning models become more apparent. State-of-the-art deep learning models are known to suffer from issues that range from poor robustness, inability to adapt to novel task settings, to requiring rigid and inflexible configuration assumptions. Collective behavior, commonly observed in nature, tends to produce systems that are robust, adaptable, and have less rigid assumptions about the environment configuration. Collective intelligence, as a field, studies the group intelligence that emerges from the interactions of many individuals. Within this field, ideas such as self-organization, emergent behavior, swarm optimization, and cellular automata were developed to model and explain complex systems. It is therefore natural to see these ideas incorporated into newer deep learning methods. In this review, we will provide a historical context of neural network research's involvement with complex systems, and highlight several active areas in modern deep learning research that incorporate the principles of collective intelligence to advance its current capabilities. We hope this review can serve as a bridge between the complex systems and deep learning communities. △ Less

Submitted 10 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

arXiv:2111.09991 [pdf, other]

doi 10.1007/978-3-030-82681-9

Sketch-based Creativity Support Tools using Deep Learning

Authors: Forrest Huang, Eldon Schoop, David Ha, Jeffrey Nichols, John Canny

Abstract: Sketching is a natural and effective visual communication medium commonly used in creative processes. Recent developments in deep-learning models drastically improved machines' ability in understanding and generating visual content. An exciting area of development explores deep-learning approaches used to model human sketches, opening opportunities for creative applications. This chapter describes… ▽ More Sketching is a natural and effective visual communication medium commonly used in creative processes. Recent developments in deep-learning models drastically improved machines' ability in understanding and generating visual content. An exciting area of development explores deep-learning approaches used to model human sketches, opening opportunities for creative applications. This chapter describes three fundamental steps in develo** deep-learning-driven creativity support tools that consumes and generates sketches: 1) a data collection effort that generated a new paired dataset between sketches and mobile user interfaces; 2) a sketch-based user interface retrieval system adapted from state-of-the-art computer vision techniques; and, 3) a conversational sketching system that supports the novel interaction of a natural-language-based sketch/critique authoring process. In this chapter, we survey relevant prior work in both the deep-learning and human-computer-interaction communities, document the data collection process and the systems' architectures in detail, present qualitative and quantitative results, and paint the landscape of several future research directions in this exciting area. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: Preprint of chapter in published in "Artificial Intelligence for Human Computer Interaction: A Modern Approach". arXiv admin note: substantial text overlap with arXiv:2005.07781

arXiv:2109.08863

Streaming algorithms for Budgeted $k$-Submodular Maximization problem

Authors: Canh V. Pham, Quang C. Vu, Dung K. T. Ha, Tai T. Nguyen

Abstract: Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set… ▽ More Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f: (k+1)^V \mapsto \mathbb{R}_+$, the problem asks to find a solution $\s=(S_1, S_2, \ldots, S_k)$, each element $e \in V$ has a cost $c_i(e)$ to be put into $i$-th set $S_i$, with the total cost of $s$ does not exceed $B$ so that $f(\s)$ is maximized. To address this problem, we propose two streaming algorithms that provide approximation guarantees for the problem. In particular, in the case of each element $e$ has the same cost for all $i$-th sets, we propose a deterministic streaming algorithm which provides an approximation ratio of $\frac{1}{4}-ε$ when $f$ is monotone and $\frac{1}{5}-ε$ when $f$ is non-monotone. For the general case, we propose a random streaming algorithm that provides an approximation ratio of $\min\{\fracα{2}, \frac{(1-α)k}{(1+β)k-β} \}-ε$ when $f$ is monotone and $\min\{\fracα{2}, \frac{(1-α)k}{(1+2β)k-2β} \}-ε$ when $f$ is non-monotone in expectation, where $β=\max_{e\in V, i , j \in [k], i\neq j} \frac{c_i(e)}{c_j(e)}$ and $ε, α$ are fixed inputs. △ Less

Submitted 22 October, 2021; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: There are some results of the article that need to be corrected

arXiv:2109.08857 [pdf, other]

Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts

Authors: Yingtao Tian, David Ha

Abstract: Evolutionary algorithms have been used in the digital art scene since the 1970s. A popular application of genetic algorithms is to optimize the procedural placement of vector graphic primitives to resemble a given painting. In recent years, deep learning-based approaches have also been proposed to generate procedural drawings, which can be optimized using gradient descent. In this work, we revisit… ▽ More Evolutionary algorithms have been used in the digital art scene since the 1970s. A popular application of genetic algorithms is to optimize the procedural placement of vector graphic primitives to resemble a given painting. In recent years, deep learning-based approaches have also been proposed to generate procedural drawings, which can be optimized using gradient descent. In this work, we revisit the use of evolutionary algorithms for computational creativity. We find that modern evolution strategies (ES) algorithms, when tasked with the placement of shapes, offer large improvements in both quality and efficiency compared to traditional genetic algorithms, and even comparable to gradient-based methods. We demonstrate that ES is also well suited at optimizing the placement of shapes to fit the CLIP model, and can produce diverse, distinct geometric abstractions that are aligned with human interpretation of language. Videos and demo: https://es-clip.github.io/ △ Less

Submitted 28 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

arXiv:2109.02869 [pdf, other]

The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning

Authors: Yu** Tang, David Ha

Abstract: In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated b… ▽ More In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy. Moreover, the system can still perform its task even if the ordering of its inputs is randomly permuted several times during an episode. These permutation invariant systems also display useful robustness and generalization properties that are broadly applicable. Interactive demo and videos of our results: https://attentionneuron.github.io/ △ Less

Submitted 28 September, 2021; v1 submitted 7 September, 2021; originally announced September 2021.

Comments: To appear at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021). Selected for a spotlight presentation

arXiv:2108.00381 [pdf, other]

Emerging Methods of Auction Design in Social Networks

Authors: Yuhang Guo, Dong Hao

Abstract: In recent years, a new branch of auction models called diffusion auction has extended the traditional auction into social network scenarios. The diffusion auction models the auction as a networked market whose nodes are potential customers and whose edges are the relations between these customers. The diffusion auction mechanism can incentivize buyers to not only submit a truthful bid, but also fu… ▽ More In recent years, a new branch of auction models called diffusion auction has extended the traditional auction into social network scenarios. The diffusion auction models the auction as a networked market whose nodes are potential customers and whose edges are the relations between these customers. The diffusion auction mechanism can incentivize buyers to not only submit a truthful bid, but also further invite their surrounding neighbors to participate into the auction. It can convene more participants than traditional auction mechanisms, which leads to better optimizations of different key aspects, such as social welfare, seller's revenue, amount of redistributed money and so on. The diffusion auctions have recently attracted a discrete interest in the algorithmic game theory and market design communities. This survey summarizes the current progress of diffusion auctions. △ Less

Submitted 1 August, 2021; originally announced August 2021.

arXiv:2105.07311 [pdf, ps, other]

How Does Regression Test Selection Affect Program Repair? An Extensive Study on 2 Million Patches

Authors: Yiling Lou, Samuel Benton, Dan Hao, Lu Zhang, Lingming Zhang

Abstract: APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. However, there is limited study on the… ▽ More APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. However, there is limited study on the impact of test selection for each patch (e.g., only the tests affected by the patch need to be executed as the other tests would keep the same outcomes and can be skipped), and few APR systems actually apply test selection. Therefore, this paper conducts the first extensive study to investigate the impact of Regression Test Selection (RTS) on APR. More specifically, we implemented widely-used RTS techniques at different levels for 12 state-of-the-art APR systems with over 2M patches. Our study reveals various practical guidelines for future APR, including: (1) the number of patches widely used for measuring APR efficiency can incur skewed conclusions, and the use of inconsistent RTS configurations can further skew the conclusion; (2) all studied RTS techniques can substantially improve APR efficiency and should be considered in future APR work; (3) method- and statement-level RTS outperform class-level RTS substantially, and should be preferred; (4) RTS techniques can substantially outperform state-of-the-art test prioritization techniques for APR, and combining them can further improve APR efficiency; and (5) traditional regression test prioritization widely studied in regression testing performs even better than APR-specific test prioritization when combined with most RTS techniques. Furthermore, we also present the detailed impact of different patch categories and patch validation strategies on our findings. △ Less

Submitted 15 May, 2021; originally announced May 2021.

arXiv:2102.05229 [pdf, ps, other]

doi 10.1016/j.neunet.2020.05.005

Sequential vessel segmentation via deep channel attention network

Authors: Dongdong Hao, Song Ding, Linwei Qiu, Yisong Lv, Baowei Fei, Yueqi Zhu, Binjie Qin

Abstract: This paper develops a novel encoder-decoder deep network architecture which exploits the several contextual frames of 2D+t sequential images in a sliding window centered at current frame to segment 2D vessel masks from the current frame. The architecture is equipped with temporal-spatial feature extraction in encoder stage, feature fusion in skip connection layers and channel attention mechanism i… ▽ More This paper develops a novel encoder-decoder deep network architecture which exploits the several contextual frames of 2D+t sequential images in a sliding window centered at current frame to segment 2D vessel masks from the current frame. The architecture is equipped with temporal-spatial feature extraction in encoder stage, feature fusion in skip connection layers and channel attention mechanism in decoder stage. In the encoder stage, a series of 3D convolutional layers are employed to hierarchically extract temporal-spatial features. Skip connection layers subsequently fuse the temporal-spatial feature maps and deliver them to the corresponding decoder stages. To efficiently discriminate vessel features from the complex and noisy backgrounds in the XCA images, the decoder stage effectively utilizes channel attention blocks to refine the intermediate feature maps from skip connection layers for subsequently decoding the refined features in 2D ways to produce the segmented vessel masks. Furthermore, Dice loss function is implemented to train the proposed deep network in order to tackle the class imbalance problem in the XCA data due to the wide distribution of complex background artifacts. Extensive experiments by comparing our method with other state-of-the-art algorithms demonstrate the proposed method's superior performance over other methods in terms of the quantitative metrics and visual validation. The source codes are at https://github.com/Binjie-Qin/SVS-net △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: 14

Journal ref: Neural Networks, 2020

arXiv:2101.02089 [pdf, other]

Impact of Inter-Channel Interference on Shallow Underwater Acoustic OFDM Systems

Authors: Do Viet Ha, Tien Hoa Nguyen, Van Duc Nguyen

Abstract: This paper investigates the impacts of Inter-Channel Interference (ICI) effects on a shallow underwater acoustic (UWA) orthogonal frequency-division multiplexing (OFDM) communication system. Considering both the turbulence of the water surface and the roughness of the bottom, a stochastic geometry-based channel model utilized for a wide-band transmission scenario has been exploited to derive a sim… ▽ More This paper investigates the impacts of Inter-Channel Interference (ICI) effects on a shallow underwater acoustic (UWA) orthogonal frequency-division multiplexing (OFDM) communication system. Considering both the turbulence of the water surface and the roughness of the bottom, a stochastic geometry-based channel model utilized for a wide-band transmission scenario has been exploited to derive a simulation model. Since the system bandwidth and the sub-carrier spacing is very limited in the range of a few kHz, the channel capacity of a UWA system is severely suffered by the ICI effect. For further investigation, we construct the signal-to-noise-plus-interference ratio (SINR) based on the simulation model, then evaluate the channel capacity. Numerical results show that the various factors of a UWA-OFDM system as subcarriers, bandwidth, and OFDM symbols affect the channel capacity under the different Doppler frequencies. Those observations give hints to select the good parameters for UWA-OFDM systems. △ Less

Submitted 6 January, 2021; originally announced January 2021.

arXiv:2012.02652 [pdf, other]

Incentive Mechanism Design for ROI-constrained Auto-bidding

Authors: Bin Li, Xiao Yang, Daren Sun, Zhi Ji, Zhen Jiang, Cong Han, Dong Hao

Abstract: Auto-bidding plays an important role in online advertising and has become a crucial tool for advertisers and advertising platforms to meet their performance objectives and optimize the efficiency of ad delivery. Advertisers employing auto-bidding only need to express high-level goals and constraints, and leave the bid optimization problem to the advertising platforms. As auto-bidding has obviously… ▽ More Auto-bidding plays an important role in online advertising and has become a crucial tool for advertisers and advertising platforms to meet their performance objectives and optimize the efficiency of ad delivery. Advertisers employing auto-bidding only need to express high-level goals and constraints, and leave the bid optimization problem to the advertising platforms. As auto-bidding has obviously changed the bidding language and the way advertisers participate in the ad auction, fundamental investigation into mechanism design for auto-bidding environment should be made to study the interaction of auto-bidding with advertisers. In this paper, we formulate the general problem of incentive mechanism design for ROI-constrained auto-bidding, and carry out analysis of strategy-proof requirements for the revenue-maximizing and profit-maximizing advertisers. In addition, we provide a mechanism framework and a practical solution to guarantee the incentive property for different types of advertisers. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2005.07781 [pdf, other]

doi 10.1145/3377325.3377485

Scones: Towards Conversational Authoring of Sketches

Authors: Forrest Huang, Eldon Schoop, David Ha, John Canny

Abstract: Iteratively refining and critiquing sketches are crucial steps to develo** effective designs. We introduce Scones, a mixed-initiative, machine-learning-driven system that enables users to iteratively author sketches from text instructions. Scones is a novel deep-learning-based system that iteratively generates scenes of sketched objects composed with semantic specifications from natural language… ▽ More Iteratively refining and critiquing sketches are crucial steps to develo** effective designs. We introduce Scones, a mixed-initiative, machine-learning-driven system that enables users to iteratively author sketches from text instructions. Scones is a novel deep-learning-based system that iteratively generates scenes of sketched objects composed with semantic specifications from natural language. Scones exceeds state-of-the-art performance on a text-based scene modification task, and introduces a mask-conditioned sketching model that can generate sketches with poses specified by high-level scene information. In an exploratory user evaluation of Scones, participants reported enjoying an iterative drawing task with Scones, and suggested additional features for further applications. We believe Scones is an early step towards automated, intelligent systems that support human-in-the-loop applications for communicating ideas through sketching in art and design. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: Long Paper, IUI '20: Proceedings of the 25th International Conference on Intelligent User Interfaces

Showing 1–50 of 88 results for author: Hao, D