Search | arXiv e-print repository

TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision

Authors: Cina Arjmand, Yingfu Xu, Kevin Shidqi, Alexandra F. Dobrita, Kanishkan Vadivel, Paul Detterer, Manolis Sifalakis, Amirreza Yousefzadeh, Guangzhi Tang

Abstract: Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a… ▽ More Neuromorphic processors are well-suited for efficiently handling sparse events from event-based cameras. However, they face significant challenges in the growth of computing demand and hardware costs as the input resolution increases. This paper proposes the Trainable Region-of-Interest Prediction (TRIP), the first hardware-efficient hard attention framework for event-based vision processing on a neuromorphic processor. Our TRIP framework actively produces low-resolution Region-of-Interest (ROIs) for efficient and accurate classification. The framework exploits sparse events' inherent low information density to reduce the overhead of ROI prediction. We introduced extensive hardware-aware optimizations for TRIP and implemented the hardware-optimized algorithm on the SENECA neuromorphic processor. We utilized multiple event-based classification datasets for evaluation. Our approach achieves state-of-the-art accuracies in all datasets and produces reasonable ROIs with varying locations and sizes. On the DvsGesture dataset, our solution requires 46x less computation than the state-of-the-art while achieving higher accuracy. Furthermore, TRIP enables more than 2x latency and energy improvements on the SENECA neuromorphic processor compared to the conventional solution. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Accepted in ICONS 2024

arXiv:2406.17285 [pdf, other]

EON-1: A Brain-Inspired Processor for Near-Sensor Extreme Edge Online Feature Extraction

Authors: Alexandra Dobrita, Amirreza Yousefzadeh, Simon Thorpe, Kanishkan Vadivel, Paul Detterer, Guangzhi Tang, Gert-Jan van Schaik, Mario Konijnenburg, Anteneh Gebregiorgis, Said Hamdioui, Manolis Sifalakis

Abstract: For Edge AI applications, deploying online learning and adaptation on resource-constrained embedded devices can deal with fast sensor-generated streams of data in changing environments. However, since maintaining low-latency and power-efficient inference is paramount at the Edge, online learning and adaptation on the device should impose minimal additional overhead for inference. With this goal in… ▽ More For Edge AI applications, deploying online learning and adaptation on resource-constrained embedded devices can deal with fast sensor-generated streams of data in changing environments. However, since maintaining low-latency and power-efficient inference is paramount at the Edge, online learning and adaptation on the device should impose minimal additional overhead for inference. With this goal in mind, we explore energy-efficient learning and adaptation on-device for streaming-data Edge AI applications using Spiking Neural Networks (SNNs), which follow the principles of brain-inspired computing, such as high-parallelism, neuron co-located memory and compute, and event-driven processing. We propose EON-1, a brain-inspired processor for near-sensor extreme edge online feature extraction, that integrates a fast online learning and adaptation algorithm. We report results of only 1% energy overhead for learning, by far the lowest overhead when compared to other SoTA solutions, while attaining comparable inference accuracy. Furthermore, we demonstrate that EON-1 is up for the challenge of low-latency processing of HD and UHD streaming video in real-time, with learning enabled. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.15192 [pdf, ps, other]

Setting Targets is All You Need:Improved Order Competitive Ratio for Online Selection

Authors: Liyan Chen, Nuozhou Sun, Zhihao Gavin Tang

Abstract: There is a rising interest for studying the online benchmark as an alternative of the classical offline benchmark in online stochastic settings. Ezra, Feldman, Gravin, and Tang (SODA 2023) introduced the notion of order-competitive ratio, defined as the worst-case ratio between the performance of the best order-unaware algorithm and the best order-aware algorithm, to quantify the loss incurred by… ▽ More There is a rising interest for studying the online benchmark as an alternative of the classical offline benchmark in online stochastic settings. Ezra, Feldman, Gravin, and Tang (SODA 2023) introduced the notion of order-competitive ratio, defined as the worst-case ratio between the performance of the best order-unaware algorithm and the best order-aware algorithm, to quantify the loss incurred by the lack of knowledge of the arrival order. They showed in the online single selection setting (a.k.a. the prophet problem), the optimal order-competitive ratio achieved by deterministic algorithms is $1/\varphi \approx 0.618$, and left with an open question whether randomized algorithms can do better. We answer the open question firmly by introducing a novel family of algorithms called \emph{targeted value algorithms}. We show that the task of online selection is as easy as guessing the optimal online benchmark. Specifically, we provide 1) an alternative optimal $1/\varphi$ order-competitive algorithm by setting the targeted value deterministically, and 2) a $0.732$ order-competitive algorithm by setting the targeted value randomly. We further provide a $0.758$ upper bound on the order-competitive ratio of our algorithm, showing that our analysis is close to the best possible, and establish an upper bound of $0.829$ on the order-competitive ratio for general randomized order-unaware algorithms. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14953 [pdf, other]

Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: a Novel Digital Biomarker for Cardiovascular Health

Authors: Guangkun Nie, Qinghao Zhao, Gongzheng Tang, Jun Li, Shenda Hong

Abstract: Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss t… ▽ More Photoplethysmography (PPG) is emerging as a crucial tool for monitoring human hemodynamics, with recent studies highlighting its potential in assessing vascular aging through deep learning. However, real-world age distributions are often imbalanced, posing significant challenges for deep learning models. In this paper, we introduce a novel, simple, and effective loss function named the Dist Loss to address deep imbalanced regression tasks. We trained a one-dimensional convolutional neural network (Net1D) incorporating the Dist Loss on the extensive UK Biobank dataset (n=502,389) to estimate vascular age from PPG signals and validate its efficacy in characterizing cardiovascular health. The model's performance was validated on a 40% held-out test set, achieving state-of-the-art results, especially in regions with small sample sizes. Furthermore, we divided the population into three subgroups based on the difference between predicted vascular age and chronological age: less than -10 years, between -10 and 10 years, and greater than 10 years. We analyzed the relationship between predicted vascular age and several cardiovascular events over a follow-up period of up to 10 years, including death, coronary heart disease, and heart failure. Our results indicate that the predicted vascular age has significant potential to reflect an individual's cardiovascular health status. Our code will be available at https://github.com/Ngk03/AI-vascular-age. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.12799 [pdf, ps, other]

Sample-Based Matroid Prophet Inequalities

Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Hongxun Wu, **zhao Wu, Qianfan Zhang

Abstract: We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way)… ▽ More We study matroid prophet inequalities when distributions are unknown and accessible only through samples. While single-sample prophet inequalities for special matroids are known, no constant-factor competitive algorithm with even a sublinear number of samples was known for general matroids. Adding more to the stake, the single-sample version of the question for general matroids has close (two-way) connections with the long-standing matroid secretary conjecture. In this work, we give a $(\frac14 - \varepsilon)$-competitive matroid prophet inequality with only $O_\varepsilon(\mathrm{poly} \log n)$ samples. Our algorithm consists of two parts: (i) a novel quantile-based reduction from matroid prophet inequalities to online contention resolution schemes (OCRSs) with $O_\varepsilon(\log n)$ samples, and (ii) a $(\frac14 - \varepsilon)$-selectable matroid OCRS with $O_\varepsilon(\mathrm{poly} \log n)$ samples which carefully addresses an adaptivity challenge. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: To appear at EC'24

arXiv:2406.11528 [pdf, other]

Optimal Robust Contract Design

Authors: Bo Peng, Zhihao Gavin Tang

Abstract: We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's pa… ▽ More We consider the robust contract design problem when the principal only has limited information about the actions the agent can take. The principal evaluates a contract according to its worst-case performance caused by the uncertain action space. Carroll (AER 2015) showed that a linear contract is optimal among deterministic contracts. Recently, Kambhampati (JET 2023) showed that the principal's payoff can be strictly increased via randomization over linear contracts. In this paper, we characterize the optimal randomized contract, which remains linear and admits a closed form of its cumulative density function. The advantage of randomized contracts over deterministic contracts can be arbitrarily large even when the principal knows only one non-trivial action of the agent. Furthermore, our result generalizes to the model of contracting with teams, by Dai and Toikka (Econometrica 2022). △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Full version of EC 2024 paper

arXiv:2405.20215 [pdf, other]

TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models

Authors: Chen Zhang, Chengguang Tang, Dading Chong, Ke Shi, Guohua Tang, Feng Jiang, Haizhou Li

Abstract: Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, whi… ▽ More Mainstream approaches to aligning large language models (LLMs) heavily rely on human preference data, particularly when models require periodic updates. The standard process for iterative alignment of LLMs involves collecting new human feedback for each update. However, the data collection process is costly and challenging to scale. To address this issue, we introduce the "TS-Align" framework, which fine-tunes a policy model using pairwise feedback data automatically mined from its outputs. This automatic mining process is efficiently accomplished through the collaboration between a large-scale teacher model and a small-scale student model. The policy fine-tuning process can be iteratively repeated using on-policy generations within our proposed teacher-student collaborative framework. Through extensive experiments, we demonstrate that our final aligned policy outperforms the base policy model with an average win rate of 69.7% across seven conversational or instruction-following datasets. Furthermore, we show that the ranking capability of the teacher is effectively distilled into the student through our pipeline, resulting in a small-scale yet effective reward model for policy model alignment. △ Less

Submitted 14 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.15521 [pdf, other]

A Preference-oriented Diversity Model Based on Mutual-information in Re-ranking for E-commerce Search

Authors: Huimu Wang, Mingming Li, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Sulong Xu, **ghe Hu

Abstract: Re-ranking is a process of rearranging ranking list to more effectively meet user demands by accounting for the interrelationships between items. Existing methods predominantly enhance the precision of search results, often at the expense of diversity, leading to outcomes that may not fulfill the varied needs of users. Conversely, methods designed to promote diversity might compromise the precisio… ▽ More Re-ranking is a process of rearranging ranking list to more effectively meet user demands by accounting for the interrelationships between items. Existing methods predominantly enhance the precision of search results, often at the expense of diversity, leading to outcomes that may not fulfill the varied needs of users. Conversely, methods designed to promote diversity might compromise the precision of the results, failing to satisfy the users' requirements for accuracy. To alleviate the above problems, this paper proposes a Preference-oriented Diversity Model Based on Mutual-information (PODM-MI), which consider both accuracy and diversity in the re-ranking process. Specifically, PODM-MI adopts Multidimensional Gaussian distributions based on variational inference to capture users' diversity preferences with uncertainty. Then we maximize the mutual information between the diversity preferences of the users and the candidate items using the maximum variational inference lower bound to enhance their correlations. Subsequently, we derive a utility matrix based on the correlations, enabling the adaptive ranking of items in line with user preferences and establishing a balance between the aforementioned objectives. Experimental results on real-world online e-commerce systems demonstrate the significant improvements of PODM-MI, and we have successfully deployed PODM-MI on an e-commerce search platform. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.05606 [pdf, other]

doi 10.1145/3626772.3661343

Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model

Authors: Enqiang Xu, Yiming Qiu, Junyang Bai, ** Zhang, Dadong Miao, Songlin Wang, Guoyu Tang, Lin Liu, Mingming Li

Abstract: In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking c… ▽ More In large e-commerce platforms, search systems are typically composed of a series of modules, including recall, pre-ranking, and ranking phases. The pre-ranking phase, serving as a lightweight module, is crucial for filtering out the bulk of products in advance for the downstream ranking module. Industrial efforts on optimizing the pre-ranking model have predominantly focused on enhancing ranking consistency, model structure, and generalization towards long-tail items. Beyond these optimizations, meeting the system performance requirements presents a significant challenge. Contrasting with existing industry works, we propose a novel method: a Generalizable and RAnk-ConsistEnt Pre-Ranking Model (GRACE), which achieves: 1) Ranking consistency by introducing multiple binary classification tasks that predict whether a product is within the top-k results as estimated by the ranking model, which facilitates the addition of learning objectives on common point-wise ranking models; 2) Generalizability through contrastive learning of representation for all products by pre-training on a subset of ranking product embeddings; 3) Ease of implementation in feature construction and online deployment. Our extensive experiments demonstrate significant improvements in both offline metrics and online A/B test: a 0.75% increase in AUC and a 1.28% increase in CVR. △ Less

Submitted 13 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

ACM Class: H.3.3

arXiv:2403.19797 [pdf, other]

Efficient 3D Instance Map** and Localization with Neural Fields

Authors: George Tang, Krishna Murthy Jatavallabhula, Antonio Torralba

Abstract: We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing… ▽ More We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing implicit scene representation based methods. Opposed to prior art that optimizes a neural field in a self-supervised manner, requiring complicated training procedures and loss function design, 3DIML leverages a two-phase process. The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels. These almost view-consistent pseudolabel masks are then used in the second phase, InstanceLift, to supervise the training of a neural label field, which interpolates regions missed by InstanceMap and resolves ambiguities. Additionally, we introduce InstanceLoc, which enables near realtime localization of instance masks given a trained label field and an off-the-shelf image segmentation model by fusing outputs from both. We evaluate 3DIML on sequences from the Replica and ScanNet datasets and demonstrate 3DIML's effectiveness under mild assumptions for the image sequences. We achieve a large practical speedup over existing implicit scene representation methods with comparable quality, showcasing its potential to facilitate faster and more effective 3D scene understanding. △ Less

Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.17160 [pdf, other]

Choosing Behind the Veil: Tight Bounds for Identity-Blind Online Algorithms

Authors: Tomer Ezra, Michal Feldman, Zhihao Gavin Tang

Abstract: In Bayesian online settings, every element has a value that is drawn from a known underlying distribution, which we refer to as the element's identity. The elements arrive sequentially. Upon the arrival of an element, its value is revealed, and the decision maker needs to decide, immediately and irrevocably, whether to accept it or not. While most previous work has assumed that the decision maker,… ▽ More In Bayesian online settings, every element has a value that is drawn from a known underlying distribution, which we refer to as the element's identity. The elements arrive sequentially. Upon the arrival of an element, its value is revealed, and the decision maker needs to decide, immediately and irrevocably, whether to accept it or not. While most previous work has assumed that the decision maker, upon observing the element's value, also becomes aware of its identity -- namely, its distribution -- practical scenarios frequently demand that decisions be made based solely on the element's value, without considering its identity. This necessity arises either from the algorithm's ignorance of the element's identity or due to the pursuit of fairness. We call such algorithms identity-blind algorithms, and propose the identity-blindness gap as a metric to evaluate the performance loss caused by identity-blindness. This gap is defined as the maximum ratio between the expected performance of an identity-blind online algorithm and an optimal online algorithm that knows the arrival order, thus also the identities. We study the identity-blindness gap in the paradigmatic prophet inequality problem, under the two objectives of maximizing the expected value, and maximizing the probability to obtain the highest value. For the max-expectation objective, the celebrated prophet inequality establishes a single-threshold algorithm that gives at least 1/2 of the offline optimum, thus also an identity-blindness gap of at least 1/2. We show that this bound is tight. For the max-probability objective, while the competitive ratio is tightly 1/e, we provide a deterministic single-threshold algorithm that gives an identity-blindness gap of $\sim 0.562$ under the assumption that there are no large point masses. Moreover, we show that this bound is tight with respect to deterministic algorithms. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: Abstract shortened for arXiv

arXiv:2401.12783 [pdf, other]

A Review of Deep Learning Methods for Photoplethysmography Data

Authors: Guangkun Nie, Jiabao Zhu, Gongzheng Tang, Deyun Zhang, Shijia Geng, Qinghao Zhao, Shenda Hong

Abstract: Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this rev… ▽ More Photoplethysmography (PPG) is a highly promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities to measure a wide range of physiological information. Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management and other multifaceted applications. In this review, we systematically reviewed papers that applied deep learning models to process PPG data between January 1st of 2017 and July 31st of 2023 from Google Scholar, PubMed and Dimensions. Each paper is analyzed from three key perspectives: tasks, models, and data. We finally extracted 193 papers where different deep learning frameworks were used to process PPG signals. Based on the tasks addressed in these papers, we categorized them into two major groups: medical-related, and non-medical-related. The medical-related tasks were further divided into seven subgroups, including blood pressure analysis, cardiovascular monitoring and diagnosis, sleep health, mental health, respiratory monitoring and analysis, blood glucose analysis, as well as others. The non-medical-related tasks were divided into four subgroups, which encompass signal processing, biometric identification, electrocardiogram reconstruction, and human activity recognition. In conclusion, significant progress has been made in the field of using deep learning methods to process PPG data recently. This allows for a more thorough exploration and utilization of the information contained in PPG signals. However, challenges remain, such as limited quantity and quality of publicly available databases, a lack of effective validation in real-world scenarios, and concerns about the interpretability, scalability, and complexity of deep learning models. Moreover, there are still emerging research areas that require further investigation. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2310.08958 [pdf, other]

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

Authors: Chen Zhang, Luis Fernando D'Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li

Abstract: Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is la… ▽ More Recent advancements in reference-free learned metrics for open-domain dialogue evaluation have been driven by the progress in pre-trained language models and the availability of dialogue data with high-quality human annotations. However, current studies predominantly concentrate on English dialogues, and the generalization of these metrics to other languages has not been fully examined. This is largely due to the absence of a multilingual dialogue evaluation benchmark. To address the issue, we introduce xDial-Eval, built on top of open-source English dialogue evaluation datasets. xDial-Eval includes 12 turn-level and 6 dialogue-level English datasets, comprising 14930 annotated turns and 8691 annotated dialogues respectively. The English dialogue data are extended to nine other languages with commercial machine translation systems. On xDial-Eval, we conduct comprehensive analyses of previous BERT-based metrics and the recently-emerged large language models. Lastly, we establish strong self-supervised and multilingual baselines. In terms of average Pearson correlations over all datasets and languages, the best baseline outperforms OpenAI's ChatGPT by absolute improvements of 6.5% and 4.6% at the turn and dialogue levels respectively, albeit with much fewer parameters. The data and code are publicly available at https://github.com/e0397123/xDial-Eval. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP-2023 Findings

arXiv:2309.05345 [pdf, other]

doi 10.1109/ISCAS46773.2023.10181778

Empirical study on the efficiency of Spiking Neural Networks with axonal delays, and algorithm-hardware benchmarking

Authors: Alberto Patiño-Saucedo, Amirreza Yousefzadeh, Guangzhi Tang, Federico Corradi, Bernabé Linares-Barranco, Manolis Sifalakis

Abstract: The role of axonal synaptic delays in the efficacy and performance of artificial neural networks has been largely unexplored. In step-based analog-valued neural network models (ANNs), the concept is almost absent. In their spiking neuroscience-inspired counterparts, there is hardly a systematic account of their effects on model performance in terms of accuracy and number of synaptic operations.Thi… ▽ More The role of axonal synaptic delays in the efficacy and performance of artificial neural networks has been largely unexplored. In step-based analog-valued neural network models (ANNs), the concept is almost absent. In their spiking neuroscience-inspired counterparts, there is hardly a systematic account of their effects on model performance in terms of accuracy and number of synaptic operations.This paper proposes a methodology for accounting for axonal delays in the training loop of deep Spiking Neural Networks (SNNs), intending to efficiently solve machine learning tasks on data with rich temporal dependencies. We then conduct an empirical study of the effects of axonal delays on model performance during inference for the Adding task, a benchmark for sequential regression, and for the Spiking Heidelberg Digits dataset (SHD), commonly used for evaluating event-driven models. Quantitative results on the SHD show that SNNs incorporating axonal delays instead of explicit recurrent synapses achieve state-of-the-art, over 90% test accuracy while needing less than half trainable synapses. Additionally, we estimate the required memory in terms of total parameters and energy consumption of accomodating such delay-trained models on a modern neuromorphic accelerator. These estimations are based on the number of synaptic operations and the reference GF-22nm FDX CMOS technology. As a result, we demonstrate that a reduced parameterization, which incorporates axonal delays, leads to approximately 90% energy and memory reduction in digital hardware implementations for a similar performance in the aforementioned task. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2306.11351 [pdf, other]

A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection

Authors: Yao Xin, Guoming Tang, Donglong Chen, Rumin Zhang, Teng Liang, Ray C. C. Cheung, Cetin Kaya Koc

Abstract: Detecting and extracting textual information from natural scene images needs Scene Text Detection (STD) algorithms. Fully Convolutional Neural Networks (FCNs) are usually utilized as the backbone model to extract features in these instance segmentation based STD algorithms. FCNs naturally come with high computational complexity. Furthermore, to keep up with the growing variety of models, flexible… ▽ More Detecting and extracting textual information from natural scene images needs Scene Text Detection (STD) algorithms. Fully Convolutional Neural Networks (FCNs) are usually utilized as the backbone model to extract features in these instance segmentation based STD algorithms. FCNs naturally come with high computational complexity. Furthermore, to keep up with the growing variety of models, flexible architectures are needed. In order to accelerate various STD algorithms efficiently, a versatility-performance balanced hardware architecture is proposed, together with a simple but efficient way of configuration. This architecture is able to compute different FCN models without hardware redesign. The optimization is focused on hardware with finely designed computing modules, while the versatility of different network reconfigurations is achieved by microcodes instead of a strenuously designed compiler. Multiple parallel techniques at different levels and several complexity-reduction methods are explored to speed up the FCN computation. Results from implementation show that, given the same tasks, the proposed system achieves a better throughput compared with the studied GPU. Particularly, our system reduces the comprehensive Operation Expense (OpEx) at GPU by 46\%, while the power efficiency is enhanced by 32\%. This work has been deployed in commercial applications and provided stable consumer text detection services. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2306.05011 [pdf, other]

Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce

Authors: Juan Gong, Zhenlin Chen, Chaoyi Ma, Zhuojian Xiao, Haonan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yunjiang Jiang

Abstract: Ranking model plays an essential role in e-commerce search and recommendation. An effective ranking model should give a personalized ranking list for each user according to the user preference. Existing algorithms usually extract a user representation vector from the user behavior sequence, then feed the vector into a feed-forward network (FFN) together with other features for feature interactions… ▽ More Ranking model plays an essential role in e-commerce search and recommendation. An effective ranking model should give a personalized ranking list for each user according to the user preference. Existing algorithms usually extract a user representation vector from the user behavior sequence, then feed the vector into a feed-forward network (FFN) together with other features for feature interactions, and finally produce a personalized ranking score. Despite tremendous progress in the past, there is still room for improvement. Firstly, the personalized patterns of feature interactions for different users are not explicitly modeled. Secondly, most of existing algorithms have poor personalized ranking results for long-tail users with few historical behaviors due to the data sparsity. To overcome the two challenges, we propose Attention Weighted Mixture of Experts (AW-MoE) with contrastive learning for personalized ranking. Firstly, AW-MoE leverages the MoE framework to capture personalized feature interactions for different users. To model the user preference, the user behavior sequence is simultaneously fed into expert networks and the gate network. Within the gate network, one gate unit and one activation unit are designed to adaptively learn the fine-grained activation vector for experts using an attention mechanism. Secondly, a random masking strategy is applied to the user behavior sequence to simulate long-tail users, and an auxiliary contrastive loss is imposed to the output of the gate network to improve the model generalization for these users. This is validated by a higher performance gain on the long-tail user test set. Experiment results on a JD real production dataset and a public dataset demonstrate the effectiveness of AW-MoE, which significantly outperforms state-of-art methods. Notably, AW-MoE has been successfully deployed in the JD e-commerce search engine, ... △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted by ICDE2023

arXiv:2306.03135 [pdf, ps, other]

On the complexity of isomorphism problems for tensors, groups, and polynomials III: actions by classical groups

Authors: Zhili Chen, Joshua A. Grochow, Youming Qiao, Gang Tang, Chuanqi Zhang

Abstract: We study the complexity of isomorphism problems for d-way arrays, or tensors, under natural actions by classical groups such as orthogonal, unitary, and symplectic groups. Such problems arise naturally in statistical data analysis and quantum information. We study two types of complexity-theoretic questions. First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the complexity of… ▽ More We study the complexity of isomorphism problems for d-way arrays, or tensors, under natural actions by classical groups such as orthogonal, unitary, and symplectic groups. Such problems arise naturally in statistical data analysis and quantum information. We study two types of complexity-theoretic questions. First, for a fixed action type (isomorphism, conjugacy, etc.), we relate the complexity of the isomorphism problem over a classical group to that over the general linear group. Second, for a fixed group type (orthogonal, unitary, or symplectic), we compare the complexity of the decision problems for different actions. Our main results are as follows. First, for orthogonal and symplectic groups acting on 3-way arrays, the isomorphism problems reduce to the corresponding problem over the general linear group. Second, for orthogonal and unitary groups, the isomorphism problems of five natural actions on 3-way arrays are polynomial-time equivalent, and the d-tensor isomorphism problem reduces to the 3-tensor isomorphism problem for any fixed d>3. For unitary groups, the preceding result implies that LOCC classification of tripartite quantum states is at least as difficult as LOCC classification of d-partite quantum states for any d. Lastly, we also show that the graph isomorphism problem reduces to the tensor isomorphism problem over orthogonal and unitary groups. △ Less

Submitted 5 June, 2023; originally announced June 2023.

MSC Class: 68Q25; 15A69; 15A72; 15A21; 15A63; 68Q15; 81P48; 16Z05; 16G99; 17B99; 17A40; 17A42 ACM Class: F.1.3; F.2.2

arXiv:2305.17709 [pdf, other]

Parallel Data Helps Neural Entity Coreference Resolution

Authors: Gongbo Tang, Christian Hardmeier

Abstract: Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we pro… ▽ More Coreference resolution is the task of finding expressions that refer to the same entity in a text. Coreference models are generally trained on monolingual annotated data but annotating coreference is expensive and challenging. Hardmeier et al.(2013) have shown that parallel data contains latent anaphoric knowledge, but it has not been explored in end-to-end neural models yet. In this paper, we propose a simple yet effective model to exploit coreference knowledge from parallel data. In addition to the conventional modules learning coreference from annotations, we introduce an unsupervised module to capture cross-lingual coreference knowledge. Our proposed cross-lingual model achieves consistent improvements, up to 1.74 percentage points, on the OntoNotes 5.0 English dataset using 9 different synthetic parallel datasets. These experimental results confirm that parallel data can provide additional coreference knowledge which is beneficial to coreference resolution tasks. △ Less

Submitted 28 May, 2023; originally announced May 2023.

Comments: camera-ready version; to appear in the Findings of ACL 2023

arXiv:2305.14810 [pdf, other]

JDsearch: A Personalized Product Search Dataset with Real Queries and Full Interactions

Authors: Jiongnan Liu, Zhicheng Dou, Guoyu Tang, Sulong Xu

Abstract: Recently, personalized product search attracts great attention and many models have been proposed. To evaluate the effectiveness of these models, previous studies mainly utilize the simulated Amazon recommendation dataset, which contains automatically generated queries and excludes cold users and tail products. We argue that evaluating with such a dataset may yield unreliable results and conclusio… ▽ More Recently, personalized product search attracts great attention and many models have been proposed. To evaluate the effectiveness of these models, previous studies mainly utilize the simulated Amazon recommendation dataset, which contains automatically generated queries and excludes cold users and tail products. We argue that evaluating with such a dataset may yield unreliable results and conclusions, and deviate from real user satisfaction. To overcome these problems, in this paper, we release a personalized product search dataset comprised of real user queries and diverse user-product interaction types (clicking, adding to cart, following, and purchasing) collected from JD.com, a popular Chinese online shop** platform. More specifically, we sample about 170,000 active users on a specific date, then record all their interacted products and issued queries in one year, without removing any tail users and products. This finally results in roughly 12,000,000 products, 9,400,000 real searches, and 26,000,000 user-product interactions. We study the characteristics of this dataset from various perspectives and evaluate representative personalization models to verify its feasibility. The dataset can be publicly accessed at Github: https://github.com/rucliujn/JDsearch. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to SIGIR 2023

arXiv:2305.14165 [pdf, other]

Impact of Light and Shadow on Robustness of Deep Neural Networks

Authors: Chengyin Hu, Weiwen Shi, Chao Li, Jialiang Sun, Donghua Wang, Junqi Wu, Guijian Tang

Abstract: Deep neural networks (DNNs) have made remarkable strides in various computer vision tasks, including image classification, segmentation, and object detection. However, recent research has revealed a vulnerability in advanced DNNs when faced with deliberate manipulations of input data, known as adversarial attacks. Moreover, the accuracy of DNNs is heavily influenced by the distribution of the trai… ▽ More Deep neural networks (DNNs) have made remarkable strides in various computer vision tasks, including image classification, segmentation, and object detection. However, recent research has revealed a vulnerability in advanced DNNs when faced with deliberate manipulations of input data, known as adversarial attacks. Moreover, the accuracy of DNNs is heavily influenced by the distribution of the training dataset. Distortions or perturbations in the color space of input images can introduce out-of-distribution data, resulting in misclassification. In this work, we propose a brightness-variation dataset, which incorporates 24 distinct brightness levels for each image within a subset of ImageNet. This dataset enables us to simulate the effects of light and shadow on the images, so as is to investigate the impact of light and shadow on the performance of DNNs. In our study, we conduct experiments using several state-of-the-art DNN architectures on the aforementioned dataset. Through our analysis, we discover a noteworthy positive correlation between the brightness levels and the loss of accuracy in DNNs. Furthermore, we assess the effectiveness of recently proposed robust training techniques and strategies, including AugMix, Revisit, and Free Normalizer, using the ResNet50 architecture on our brightness-variation dataset. Our experimental results demonstrate that these techniques can enhance the robustness of DNNs against brightness variation, leading to improved performance when dealing with images exhibiting varying brightness levels. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2209.02832, arXiv:2209.02132

arXiv:2304.04640 [pdf, other]

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community. △ Less

Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Updated from whitepaper to full perspective article preprint

arXiv:2303.15224 [pdf, other]

Open the box of digital neuromorphic processor: Towards effective algorithm-hardware co-design

Authors: Guangzhi Tang, Ali Safa, Kevin Shidqi, Paul Detterer, Stefano Traferro, Mario Konijnenburg, Manolis Sifalakis, Gert-Jan van Schaik, Amirreza Yousefzadeh

Abstract: Sparse and event-driven spiking neural network (SNN) algorithms are the ideal candidate solution for energy-efficient edge computing. Yet, with the growing complexity of SNN algorithms, it isn't easy to properly benchmark and optimize their computational cost without hardware in the loop. Although digital neuromorphic processors have been widely adopted to benchmark SNN algorithms, their black-box… ▽ More Sparse and event-driven spiking neural network (SNN) algorithms are the ideal candidate solution for energy-efficient edge computing. Yet, with the growing complexity of SNN algorithms, it isn't easy to properly benchmark and optimize their computational cost without hardware in the loop. Although digital neuromorphic processors have been widely adopted to benchmark SNN algorithms, their black-box nature is problematic for algorithm-hardware co-optimization. In this work, we open the black box of the digital neuromorphic processor for algorithm designers by presenting the neuron processing instruction set and detailed energy consumption of the SENeCA neuromorphic architecture. For convenient benchmarking and optimization, we provide the energy cost of the essential neuromorphic components in SENeCA, including neuron models and learning rules. Moreover, we exploit the SENeCA's hierarchical memory and exhibit an advantage over existing neuromorphic processors. We show the energy efficiency of SNN algorithms for video processing and online learning, and demonstrate the potential of our work for optimizing algorithm designs. Overall, we present a practical approach to enable algorithm designers to accurately benchmark SNN algorithms and pave the way towards effective algorithm-hardware co-design. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2211.10969 [pdf, ps, other]

Bidder Subset Selection Problem in Auction Design

Authors: Xiaohui Bei, Nick Gravin, Pinyan Lu, Zhihao Gavin Tang

Abstract: Motivated by practical concerns in the online advertising industry, we study a bidder subset selection problem in single-item auctions. In this problem, a large pool of candidate bidders have independent values sampled from known prior distributions. The seller needs to pick a subset of bidders and run a given auction format on the selected subset to maximize her expected revenue. We propose two f… ▽ More Motivated by practical concerns in the online advertising industry, we study a bidder subset selection problem in single-item auctions. In this problem, a large pool of candidate bidders have independent values sampled from known prior distributions. The seller needs to pick a subset of bidders and run a given auction format on the selected subset to maximize her expected revenue. We propose two frameworks for the subset restrictions: (i) capacity constraint on the set of selected bidders; and (ii) incurred costs for the bidders invited to the auction. For the second-price auction with anonymous reserve (SPA-AR), we give constant approximation polynomial time algorithms in both frameworks (in the latter framework under mild assumptions about the market). Our results are in stark contrast to the previous work of Mehta, Nadav, Psomas, Rubinstein [NeurIPS 2020], who showed hardness of approximation for the SPA without a reserve price. We also give complimentary approximation results for other well-studied auction formats such as anonymous posted pricing and sequential posted pricing. On a technical level, we find that the revenue of SPA-AR as a set function $f(S)$ of its bidders $S$ is fractionally-subadditive but not submodular. Our bidder selection problem with invitation costs is a natural question about (approximately) answering a demand oracle for $f(\cdot)$ under a given vector of costs, a common computational assumption in the literature on combinatorial auctions. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: 17 pages. To appear at SODA 2023

arXiv:2210.10370 [pdf, other]

On the Perturbation Function of Ranking and Balance for Weighted Online Bipartite Matching

Authors: **gxun Liang, Zhihao Gavin Tang, Yixuan Even Xu, Yuhao Zhang, Renfei Zhou

Abstract: Ranking and Balance are arguably the two most important algorithms in the online matching literature. They achieve the same optimal competitive ratio of $1-1/e$ for the integral version and fractional version of online bipartite matching by Karp, Vazirani, and Vazirani (STOC 1990) respectively. The two algorithms have been generalized to weighted online bipartite matching problems, including verte… ▽ More Ranking and Balance are arguably the two most important algorithms in the online matching literature. They achieve the same optimal competitive ratio of $1-1/e$ for the integral version and fractional version of online bipartite matching by Karp, Vazirani, and Vazirani (STOC 1990) respectively. The two algorithms have been generalized to weighted online bipartite matching problems, including vertex-weighted online bipartite matching and AdWords, by utilizing a perturbation function. The canonical choice of the perturbation function is $f(x)=1-e^{x-1}$ as it leads to the optimal competitive ratio of $1-1/e$ in both settings. We advance the understanding of the weighted generalizations of Ranking and Balance in this paper, with a focus on studying the effect of different perturbation functions. First, we prove that the canonical perturbation function is the \emph{unique} optimal perturbation function for vertex-weighted online bipartite matching. In stark contrast, all perturbation functions achieve the optimal competitive ratio of $1-1/e$ in the unweighted setting. Second, we prove that the generalization of Ranking to AdWords with unknown budgets using the canonical perturbation function is at most $0.624$ competitive, refuting a conjecture of Vazirani (2021). More generally, as an application of the first result, we prove that no perturbation function leads to the prominent competitive ratio of $1-1/e$ by establishing an upper bound of $1-1/e-0.0003$. Finally, we propose the online budget-additive welfare maximization problem that is intermediate between AdWords and AdWords with unknown budgets, and we design an optimal $1-1/e$ competitive algorithm by generalizing Balance. △ Less

Submitted 5 July, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: Conference version to appear at the European Symposium on Algorithms (ESA 2023). 16 pages, 2 figures, 8 pages appendix

arXiv:2210.07594 [pdf, other]

See Blue Sky: Deep Image Dehaze Using Paired and Unpaired Training Images

Authors: Xiaoyan Zhang, Gaoyang Tang, Yingying Zhu, Qi Tian

Abstract: The issue of image haze removal has attracted wide attention in recent years. However, most existing haze removal methods cannot restore the scene with clear blue sky, since the color and texture information of the object in the original haze image is insufficient. To remedy this, we propose a cycle generative adversarial network to construct a novel end-to-end image dehaze model. We adopt outdoor… ▽ More The issue of image haze removal has attracted wide attention in recent years. However, most existing haze removal methods cannot restore the scene with clear blue sky, since the color and texture information of the object in the original haze image is insufficient. To remedy this, we propose a cycle generative adversarial network to construct a novel end-to-end image dehaze model. We adopt outdoor image datasets to train our model, which includes a set of real-world unpaired image dataset and a set of paired image dataset to ensure that the generated images are close to the real scene. Based on the cycle structure, our model adds four different kinds of loss function to constrain the effect including adversarial loss, cycle consistency loss, photorealism loss and paired L1 loss. These four constraints can improve the overall quality of such degraded images for better visual appeal and ensure reconstruction of images to keep from distortion. The proposed model could remove the haze of images and also restore the sky of images to be clean and blue (like captured in a sunny weather). △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2209.14262 [pdf, other]

A Survey on Physical Adversarial Attack in Computer Vision

Authors: Donghua Wang, Wen Yao, Tingsong Jiang, Guijian Tang, Xiaoqian Chen

Abstract: Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-craft feature extraction with its strong feature learning capability, leading to substantial enhancements in traditional tasks. However, deep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples crafted by malicious tiny noise, which is imperceptible to human observers but… ▽ More Over the past decade, deep learning has revolutionized conventional tasks that rely on hand-craft feature extraction with its strong feature learning capability, leading to substantial enhancements in traditional tasks. However, deep neural networks (DNNs) have been demonstrated to be vulnerable to adversarial examples crafted by malicious tiny noise, which is imperceptible to human observers but can make DNNs output the wrong result. Existing adversarial attacks can be categorized into digital and physical adversarial attacks. The former is designed to pursue strong attack performance in lab environments while hardly remaining effective when applied to the physical world. In contrast, the latter focus on develo** physical deployable attacks, thus exhibiting more robustness in complex physical environmental conditions. Recently, with the increasing deployment of the DNN-based system in the real world, strengthening the robustness of these systems is an emergency, while exploring physical adversarial attacks exhaustively is the precondition. To this end, this paper reviews the evolution of physical adversarial attacks against DNN-based computer vision tasks, expecting to provide beneficial information for develo** stronger physical adversarial attacks. Specifically, we first proposed a taxonomy to categorize the current physical adversarial attacks and grouped them. Then, we discuss the existing physical attacks and focus on the technique for improving the robustness of physical attacks under complex physical environmental conditions. Finally, we discuss the issues of the current physical adversarial attacks to be solved and give promising directions. △ Less

Submitted 18 September, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.12934 [pdf, ps, other]

doi 10.1007/978-3-031-15714-1_4

Lookahead Auctions with Pooling

Authors: Almog Wald, Michal Feldman, Nick Gravin, Zhihao Gavin Tang

Abstract: A Lookahead Auction (LA), introduced by Ronen, is an auction format for the sale of a single item among multiple buyers, which is considered simpler and more fair than the optimal auction. Indeed, it anonymously selects a provisional winner by a symmetric ascending-price process, and only then uses a personalized posted price. A LA auction extracts at least 1/2 of the optimal revenue, even under a… ▽ More A Lookahead Auction (LA), introduced by Ronen, is an auction format for the sale of a single item among multiple buyers, which is considered simpler and more fair than the optimal auction. Indeed, it anonymously selects a provisional winner by a symmetric ascending-price process, and only then uses a personalized posted price. A LA auction extracts at least 1/2 of the optimal revenue, even under a correlated value distribution. This bound is tight, even for 2 buyers with independent values. We introduce a natural extension of LA, called lookahead with pooling (LAP). A LAP auction proceeds as LA, with one difference: it allows the seller to pool together a range of values during the ascending-price stage, and treat them the same; thus, it preserves the simplicity and fairness of LA. Our main result is that this simple pooling operation improves the revenue guarantees for independent buyers from 1/2 to 4/7 of the optimal revenue. We also give a complementary negative result, showing that for arbitrary correlated priors LAP cannot do better than 1/2 approximation. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Journal ref: SAGT (2022) Algorithmic Game Theory 60-77

arXiv:2207.04327 [pdf, other]

Error Analysis of Tensor-Train Cross Approximation

Authors: Zhen Qin, Alexander Lidiak, Zhexuan Gong, Gongguo Tang, Michael B. Wakin, Zhihui Zhu

Abstract: Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality. Cross approximation-originally developed for representing a matrix from a set of selected rows and columns-is an efficient method for constructing a tensor train decomposition of a tensor from few of its entries. Wh… ▽ More Tensor train decomposition is widely used in machine learning and quantum physics due to its concise representation of high-dimensional tensors, overcoming the curse of dimensionality. Cross approximation-originally developed for representing a matrix from a set of selected rows and columns-is an efficient method for constructing a tensor train decomposition of a tensor from few of its entries. While tensor train cross approximation has achieved remarkable performance in practical applications, its theoretical analysis, in particular regarding the error of the approximation, is so far lacking. To our knowledge, existing results only provide element-wise approximation accuracy guarantees, which lead to a very loose bound when extended to the entire tensor. In this paper, we bridge this gap by providing accuracy guarantees in terms of the entire tensor for both exact and noisy measurements. Our results illustrate how the choice of selected subtensors affects the quality of the cross approximation and that the approximation error caused by model error and/or measurement error may not grow exponentially with the order of the tensor. These results are verified by numerical experiments, and may have important implications for the usefulness of cross approximations for high-order tensors, such as those encountered in the description of quantum many-body states. △ Less

Submitted 24 June, 2023; v1 submitted 9 July, 2022; originally announced July 2022.

arXiv:2206.13155 [pdf, other]

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

Authors: Chuwei Luo, Guozhi Tang, Qi Zheng, Cong Yao, Lianwen **, Chenliang Li, Yang Xue, Luo Si

Abstract: Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks. Though existing document pre-trained models have achieved excellent performance on standard benchmarks for VrDU, the way they model and exploit the interactions between vision and language on documents has hindered them from better generalization ability and h… ▽ More Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks. Though existing document pre-trained models have achieved excellent performance on standard benchmarks for VrDU, the way they model and exploit the interactions between vision and language on documents has hindered them from better generalization ability and higher accuracy. In this work, we investigate the problem of vision-language joint representation learning for VrDU mainly from the perspective of supervisory signals. Specifically, a pre-training paradigm called Bi-VLDoc is proposed, in which a bidirectional vision-language supervision strategy and a vision-language hybrid-attention mechanism are devised to fully explore and utilize the interactions between these two modalities, to learn stronger cross-modal document representations with richer semantics. Benefiting from the learned informative cross-modal document representations, Bi-VLDoc significantly advances the state-of-the-art performance on three widely-used document understanding benchmarks, including Form Understanding (from 85.14% to 93.44%), Receipt Information Extraction (from 96.01% to 97.84%), and Document Classification (from 96.08% to 97.12%). On Document Visual QA, Bi-VLDoc achieves the state-of-the-art performance compared to previous single model methods. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: Under review

arXiv:2204.06851 [pdf, other]

doi 10.1145/3519935.3519994

(Fractional) Online Stochastic Matching via Fine-Grained Offline Statistics

Authors: Zhihao Gavin Tang, Hongxun Wu, **zhao Wu

Abstract: Motivated by display advertising on the internet, the online stochastic matching problem is proposed by Feldman, Mehta, Mirrokni, and Muthukrishnan (FOCS 2009). Consider a stochastic bipartite graph with offline vertices on one side and with i.i.d. online vertices on the other side. The algorithm knows the offline vertices and the distribution of the online vertices in advance. Upon the arrival of… ▽ More Motivated by display advertising on the internet, the online stochastic matching problem is proposed by Feldman, Mehta, Mirrokni, and Muthukrishnan (FOCS 2009). Consider a stochastic bipartite graph with offline vertices on one side and with i.i.d. online vertices on the other side. The algorithm knows the offline vertices and the distribution of the online vertices in advance. Upon the arrival of each online vertex, its type is realized and the algorithm immediately and irrevocably decides how to match it. In the vertex-weighted version of the problem, each offline vertex is associated with a weight and the goal is to maximize the total weight of the matching. In this paper, we generalize the model to allow non-identical online vertices and focus on the fractional version of the vertex-weighted stochastic matching. We design fractional algorithms that are $0.718$-competitive and $0.731$-competitive for non i.i.d. arrivals and i.i.d. arrivals respectively. We also prove that no fractional algorithm can achieve a competitive ratio better than $0.75$ for non i.i.d. arrivals. Furthermore, we round our fractional algorithms by applying the recently developed multiway online correlated selection by Gao et al. (FOCS 2021) and achieve $0.666$-competitive and $0.704$-competitive integral algorithms for non i.i.d. arrivals and i.i.d. arrivals. Our results for non i.i.d. arrivals are the first algorithms beating the $1-1/e \approx 0.632$ barrier of the classical adversarial setting. Our $0.704$-competitive integral algorithm for i.i.d. arrivals slightly improves the state-of-the-art $0.701$-competitive ratio by Huang and Shu (STOC 2021). △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: To appear in STOC 2022

arXiv:2204.01425 [pdf, other]

Order Selection Prophet Inequality: From Threshold Optimization to Arrival Time Design

Authors: Bo Peng, Zhihao Gavin Tang

Abstract: In the classical prophet inequality, a gambler faces a sequence of items, whose values are drawn independently from known distributions. Upon the arrival of each item, its value is realized and the gambler either accepts it and the game ends, or irrevocably rejects it and continues to the next item. The goal is to maximize the value of the selected item and compete against the expected maximum val… ▽ More In the classical prophet inequality, a gambler faces a sequence of items, whose values are drawn independently from known distributions. Upon the arrival of each item, its value is realized and the gambler either accepts it and the game ends, or irrevocably rejects it and continues to the next item. The goal is to maximize the value of the selected item and compete against the expected maximum value of all items. A tight competitive ratio of $\frac{1}{2}$ is established in the classical setting and various relaxations have been proposed to surpass the barrier, including the i.i.d. model, the order selection model, and the random order model. In this paper, we advance the study of the order selection prophet inequality, in which the gambler is given the extra power for selecting the arrival order of the items. Our main result is a $0.725$-competitive algorithm, that substantially improves the state-of-the-art $0.669$ ratio by Correa, Saona and Ziliotto~(Math. Program. 2021), achieved in the harder random order model. Recently, Agrawal, Sethuraman and Zhang~(EC 2021) proved that the task of selecting the optimal order is NP-hard. Despite this fact, we introduce a novel algorithm design framework that translates the discrete order selection problem into a continuous arrival time design problem. From this perspective, we can focus on the arrival time design without worrying about the threshold optimization afterwards. As a side result, we achieve the optimal $0.745$ competitive ratio by applying our algorithm to the i.i.d. model. △ Less

Submitted 4 April, 2022; originally announced April 2022.

arXiv:2204.01418 [pdf, ps, other]

Online Ordinal Problems: Optimality of Comparison-based Algorithms and their Cardinal Complexity

Authors: Nick Gravin, Enze Sun, Zhihao Gavin Tang

Abstract: We consider ordinal online problems, i.e., tasks that only require pairwise comparisons between elements of the input. A classic example is the secretary problem and the game of googol, as well as its multiple combinatorial extensions such as $(J,K)$-secretary, $2$-sided game of googol, ordinal-competitive matroid secretary. A natural approach to these tasks is to use ordinal algorithms that at ea… ▽ More We consider ordinal online problems, i.e., tasks that only require pairwise comparisons between elements of the input. A classic example is the secretary problem and the game of googol, as well as its multiple combinatorial extensions such as $(J,K)$-secretary, $2$-sided game of googol, ordinal-competitive matroid secretary. A natural approach to these tasks is to use ordinal algorithms that at each step only consider relative ranking among the arrived elements, without looking at the numerical values of the input. We formally study the question of how cardinal algorithms can improve upon ordinal algorithms. We give first a universal construction of the input distribution for any ordinal online problem, such that the advantage of any cardinal algorithm over the ordinal algorithms is at most $1+\varepsilon$ for arbitrary small $\varepsilon> 0$. As an implication, previous lower bounds for the aforementioned variants of secretary problems hold not only against ordinal algorithms, but also against any online algorithm. However, the value range of the input elements in our construction is huge: $N=O\left(\frac{n^3\cdot n!\cdot n!}{\varepsilon}\right)\uparrow\uparrow(n-1)$ (tower of exponents) for an input sequence of length $n$. As a second result, we identify a class of natural ordinal problems and find cardinal algorithm with a matching advantage of $1+ Ω\left(\frac{1}{\log^{(c)}N}\right),$ where $\log^{(c)}N=\log\ldots\log N$ with $c$ iterative logs and $c$ is an arbitrary constant. Further, we introduce the cardinal complexity for any given ordinal online task: the minimum size $N(\varepsilon)$ of different numerical values in the input such the advantage of cardinal over ordinal algorithms is at most $1+\varepsilon$. As a third result, we show that the game of googol has much lower cardinal complexity of $N=O\left(\left(\frac{n}{\varepsilon}\right)^n\right)$. △ Less

Submitted 11 October, 2023; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: To appear at FOCS 2023. Abstract shortened to meet arXiv requirements

arXiv:2203.03927 [pdf, other]

Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

Authors: Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Yunong Yangli, Anxing Xiao, Xueqian Wang, Bin Liang

Abstract: Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the tracti… ▽ More Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the traction force experienced by the human. In this paper, we propose a novel quadruped guidance robot system with a comfort-based concept. We design a controllable traction device that can adjust the length and force between human and robot to ensure comfort. To allow the human to be guided safely and comfortably to the target position in complex environments, our proposed human motion planner can plan the traction force with the force-based human motion model. To track the planned force, we also propose a robot motion planner that can generate the specific robot motion command and design the force control device. Our system has been deployed on Unitree Laikago quadrupedal platform and validated in real-world scenarios. △ Less

Submitted 23 June, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2202.09215 [pdf, other]

"Who Is Next in Line?'' On the Significance of Knowing the Arrival Order in Bayesian Online Settings

Authors: Tomer Ezra, Michal Feldman, Nick Gravin, Zhihao Gavin Tang

Abstract: We introduce a new measure for the performance of online algorithms in Bayesian settings, where the input is drawn from a known prior, but the realizations are revealed one-by-one in an online fashion. Our new measure is called order-competitive ratio. It is defined as the worst case (over all distribution sequences) ratio between the performance of the best order-unaware and order-aware algorithm… ▽ More We introduce a new measure for the performance of online algorithms in Bayesian settings, where the input is drawn from a known prior, but the realizations are revealed one-by-one in an online fashion. Our new measure is called order-competitive ratio. It is defined as the worst case (over all distribution sequences) ratio between the performance of the best order-unaware and order-aware algorithms, and quantifies the loss that is incurred due to lack of knowledge of the arrival order. Despite the growing interest in the role of the arrival order on the performance of online algorithms, this loss has been overlooked thus far. We study the order-competitive ratio in the paradigmatic prophet inequality problem, for the two common objective functions of (i) maximizing the expected value, and (ii) maximizing the probability of obtaining the largest value; and with respect to two families of algorithms, namely (i) adaptive algorithms, and (ii) single-threshold algorithms. We provide tight bounds for all four combinations, with respect to deterministic algorithms. Our analysis requires new ideas and departs from standard techniques. In particular, our adaptive algorithms inevitably go beyond single-threshold algorithms. The results with respect to the order-competitive ratio measure capture the intuition that adaptive algorithms are stronger than single-threshold ones, and may lead to a better algorithmic advice than the classical competitive ratio measure. △ Less

Submitted 4 November, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.02948 [pdf, other]

Improved Bounds for Fractional Online Matching Problems

Authors: Zhihao Gavin Tang, Yuhao Zhang

Abstract: Online bipartite matching with one-sided arrival and its variants have been extensively studied since the seminal work of Karp, Vazirani, and Vazirani (STOC 1990). Motivated by real-life applications with dynamic market structures, e.g. ride-sharing, two generalizations of the classical one-sided arrival model are proposed to allow non-bipartite graphs and to allow all vertices to arrive online. N… ▽ More Online bipartite matching with one-sided arrival and its variants have been extensively studied since the seminal work of Karp, Vazirani, and Vazirani (STOC 1990). Motivated by real-life applications with dynamic market structures, e.g. ride-sharing, two generalizations of the classical one-sided arrival model are proposed to allow non-bipartite graphs and to allow all vertices to arrive online. Namely, online matching with general vertex arrival is introduced by Wang and Wong (ICALP 2015), and fully online matching is introduced by Huang et al. (JACM 2020). In this paper, we study the fractional versions of the two models. We improve three out of the four state-of-the-art upper and lower bounds of the two models. For fully online matching, we design a $0.6$-competitive algorithm and prove no algorithm can be $0.613$-competitive. For online matching with general vertex arrival, we prove no algorithm can be $0.584$-competitive. Moreover, we give an arguably more intuitive algorithm for the general vertex arrival model, compared to the algorithm of Wang and Wong, while attaining the same competitive ratio of $0.526$. △ Less

Submitted 8 February, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2202.00245 [pdf, other]

doi 10.1145/3459637.3481954

Sequential Search with Off-Policy Reinforcement Learning

Authors: Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun Xiao, Lingfei Wu, Yunjiang Jiang

Abstract: Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's curre… ▽ More Recent years have seen a significant amount of interests in Sequential Recommendation (SR), which aims to understand and model the sequential user behaviors and the interactions between users and items over time. Surprisingly, despite the huge success Sequential Recommendation has achieved, there is little study on Sequential Search (SS), a twin learning task that takes into account a user's current and past search queries, in addition to behavior on historical query sessions. The SS learning task is even more important than the counterpart SR task for most of E-commence companies due to its much larger online serving demands as well as traffic volume. To this end, we propose a highly scalable hybrid learning model that consists of an RNN learning framework leveraging all features in short-term user-item interactions, and an attention model utilizing selected item-only features from long-term interactions. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. Moreover, we explore the use of off-policy reinforcement learning in multi-session personalized search ranking. Specifically, we design a pairwise Deep Deterministic Policy Gradient model that efficiently captures users' long term reward in terms of pairwise classification error. Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: 10 pages, 7 figures, CIKM 2021

arXiv:2201.09186 [pdf, other]

pvCNN: Privacy-Preserving and Verifiable Convolutional Neural Network Testing

Authors: Jiasi Weng, Jian Weng, Gui Tang, Anjia Yang, Ming Li, Jia-Nan Liu

Abstract: This paper proposes a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing, enabling a CNN model developer to convince a user of the truthful CNN performance over non-public data from multiple testers, while respecting model privacy. To balance the security and efficiency issues, three new efforts are done by appropriately integrating homomorphic encryption… ▽ More This paper proposes a new approach for privacy-preserving and verifiable convolutional neural network (CNN) testing, enabling a CNN model developer to convince a user of the truthful CNN performance over non-public data from multiple testers, while respecting model privacy. To balance the security and efficiency issues, three new efforts are done by appropriately integrating homomorphic encryption (HE) and zero-knowledge succinct non-interactive argument of knowledge (zk-SNARK) primitives with the CNN testing. First, a CNN model to be tested is strategically partitioned into a private part kept locally by the model developer, and a public part outsourced to an outside server. Then, the private part runs over HE-protected test data sent by a tester and transmits its outputs to the public part for accomplishing subsequent computations of the CNN testing. Second, the correctness of the above CNN testing is enforced by generating zk-SNARK based proofs, with an emphasis on optimizing proving overhead for two-dimensional (2-D) convolution operations, since the operations dominate the performance bottleneck during generating proofs. We specifically present a new quadratic matrix programs (QMPs)-based arithmetic circuit with a single multiplication gate for expressing 2-D convolution operations between multiple filters and inputs in a batch manner. Third, we aggregate multiple proofs with respect to a same CNN model but different testers' test data (i.e., different statements) into one proof, and ensure that the validity of the aggregated proof implies the validity of the original multiple proofs. Lastly, our experimental results demonstrate that our QMPs-based zk-SNARK performs nearly 13.9$\times$faster than the existing QAPs-based zk-SNARK in proving time, and 17.6$\times$faster in Setup time, for high-dimension matrix multiplication. △ Less

Submitted 28 May, 2023; v1 submitted 23 January, 2022; originally announced January 2022.

arXiv:2111.10607 [pdf, ps, other]

Oblivious Online Contention Resolution Schemes

Authors: Hu Fu, Pinyan Lu, Zhihao Gavin Tang, Abner Turkieltaub, Hongxun Wu, **zhao Wu, Qianfan Zhang

Abstract: Contention resolution schemes (CRSs) are powerful tools for obtaining "ex post feasible" solutions from candidates that are drawn from "ex ante feasible" distributions. Online contention resolution schemes (OCRSs), the online version, have found myriad applications in Bayesian and stochastic problems, such as prophet inequalities and stochastic probing. When the ex ante distribution is unknown,… ▽ More Contention resolution schemes (CRSs) are powerful tools for obtaining "ex post feasible" solutions from candidates that are drawn from "ex ante feasible" distributions. Online contention resolution schemes (OCRSs), the online version, have found myriad applications in Bayesian and stochastic problems, such as prophet inequalities and stochastic probing. When the ex ante distribution is unknown, it was unknown whether good CRSs/OCRSs exist with no sample (in which case the scheme is oblivious) or few samples from the distribution. In this work, we give a simple $\frac{1}{e}$-selectable oblivious single item OCRS by mixing two simple schemes evenly, and show, via a Ramsey theory argument, that it is optimal. On the negative side, we show that no CRS or OCRS with $O(1)$ samples can be $Ω(1)$-balanced/selectable (i.e., preserve every active candidate with a constant probability) for graphic or transversal matroids. △ Less

Submitted 20 November, 2021; originally announced November 2021.

arXiv:2110.14092 [pdf, other]

BioGrad: Biologically Plausible Gradient-Based Learning for Spiking Neural Networks

Authors: Guangzhi Tang, Neelesh Kumar, Ioannis Polykretis, Konstantinos P. Michmizos

Abstract: Spiking neural networks (SNN) are delivering energy-efficient, massively parallel, and low-latency solutions to AI problems, facilitated by the emerging neuromorphic chips. To harness these computational benefits, SNN need to be trained by learning algorithms that adhere to brain-inspired neuromorphic principles, namely event-based, local, and online computations. Yet, the state-of-the-art SNN tra… ▽ More Spiking neural networks (SNN) are delivering energy-efficient, massively parallel, and low-latency solutions to AI problems, facilitated by the emerging neuromorphic chips. To harness these computational benefits, SNN need to be trained by learning algorithms that adhere to brain-inspired neuromorphic principles, namely event-based, local, and online computations. Yet, the state-of-the-art SNN training algorithms are based on backprop that does not follow the above principles. Due to its limited biological plausibility, the application of backprop to SNN requires non-local feedback pathways for transmitting continuous-valued errors, and relies on gradients from future timesteps. The introduction of biologically plausible modifications to backprop has helped overcome several of its limitations, but limits the degree to which backprop is approximated, which hinders its performance. We propose a biologically plausible gradient-based learning algorithm for SNN that is functionally equivalent to backprop, while adhering to all three neuromorphic principles. We introduced multi-compartment spiking neurons with local eligibility traces to compute the gradients required for learning, and a periodic "sleep" phase to further improve the approximation to backprop during which a local Hebbian rule aligns the feedback and feedforward weights. Our method achieved the same level of performance as backprop with multi-layer fully connected SNN on MNIST (98.13%) and the event-based N-MNIST (97.59%) datasets. We deployed our learning algorithm on Intel's Loihi to train a 1-hidden-layer network for MNIST, and obtained 93.32% test accuracy while consuming 400 times less energy per training sample than BioGrad on GPU. Our work shows that optimal learning is feasible in neuromorphic computing, and further pursuing its biological plausibility can better capture the benefits of this emerging computing paradigm. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: 14 pages, 6 figures

arXiv:2110.08840 [pdf, other]

Online Facility Location with Predictions

Authors: Shaofeng H. -C. Jiang, Erzhi Liu, You Lyu, Zhihao Gavin Tang, Yubo Zhang

Abstract: We provide nearly optimal algorithms for online facility location (OFL) with predictions. In OFL, $n$ demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities plus the facility opening cost. We further assume the algorithm is addit… ▽ More We provide nearly optimal algorithms for online facility location (OFL) with predictions. In OFL, $n$ demand points arrive in order and the algorithm must irrevocably assign each demand point to an open facility upon its arrival. The objective is to minimize the total connection costs from demand points to assigned facilities plus the facility opening cost. We further assume the algorithm is additionally given for each demand point $x_i$ a natural prediction $f_{x_i}^{\mathrm{pred}}$ which is supposed to be the facility $f_{x_i}^{\mathrm{opt}}$ that serves $x_i$ in the offline optimal solution. Our main result is an $O(\min\{\log {\frac{nη_\infty}{\mathrm{OPT}}}, \log{n} \})$-competitive algorithm where $η_\infty$ is the maximum prediction error (i.e., the distance between $f_{x_i}^{\mathrm{pred}}$ and $f_{x_i}^{\mathrm{opt}}$). Our algorithm overcomes the fundamental $Ω(\frac{\log n}{\log \log n})$ lower bound of OFL (without predictions) when $η_\infty$ is small, and it still maintains $O(\log n)$ ratio even when $η_\infty$ is unbounded. Furthermore, our theoretical analysis is supported by empirical evaluations for the tradeoffs between $η_\infty$ and the competitive ratio on various real datasets of different types. △ Less

Submitted 5 August, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: Updated the comparison to a previous work

arXiv:2108.06091 [pdf, other]

BESS Aided Reconfigurable Energy Supply using Deep Reinforcement Learning for 5G and Beyond

Authors: Hao Yuan, Guoming Tang, Deke Guo, Kui Wu, Xun Shao, Ke** Yu, Wei Wei

Abstract: The year of 2020 has witnessed the unprecedented development of 5G networks, along with the widespread deployment of 5G base stations (BSs). Nevertheless, the enormous energy consumption of BSs and the incurred huge energy cost have become significant concerns for the mobile operators. As the continuous decline of the renewable energy cost, equip** the power-hungry BSs with renewable energy gene… ▽ More The year of 2020 has witnessed the unprecedented development of 5G networks, along with the widespread deployment of 5G base stations (BSs). Nevertheless, the enormous energy consumption of BSs and the incurred huge energy cost have become significant concerns for the mobile operators. As the continuous decline of the renewable energy cost, equip** the power-hungry BSs with renewable energy generators could be a sustainable solution. In this work, we propose an energy storage aided reconfigurable renewable energy supply solution for the BS, which could supply clean energy to the BS and store surplus energy for backup usage. Specifically, to flexibly reconfigure the battery's discharging/charging operations, we propose a deep reinforcement learning based reconfiguring policy, which can adapt to the dynamical renewable energy generations as well as the varying power demands. Our experiments using the real-world data on renewable energy generations and power demands demonstrate that, our reconfigurable power supply solution can achieve an energy saving ratio of 74.8%, compared to the case with traditional power grid supply. △ Less

Submitted 13 August, 2021; originally announced August 2021.

arXiv:2107.12203 [pdf, other]

Revisiting Negation in Neural Machine Translation

Authors: Gongbo Tang, Philipp Rönchen, Rico Sennrich, Joakim Nivre

Abstract: In this paper, we evaluate the translation of negation both automatically and manually, in English--German (EN--DE) and English--Chinese (EN--ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual eval… ▽ More In this paper, we evaluate the translation of negation both automatically and manually, in English--German (EN--DE) and English--Chinese (EN--ZH). We show that the ability of neural machine translation (NMT) models to translate negation has improved with deeper and more advanced networks, although the performance varies between language pairs and translation directions. The accuracy of manual evaluation in EN-DE, DE-EN, EN-ZH, and ZH-EN is 95.7%, 94.8%, 93.4%, and 91.7%, respectively. In addition, we show that under-translation is the most significant error type in NMT, which contrasts with the more diverse error profile previously observed for statistical machine translation. To better understand the root of the under-translation of negation, we study the model's information flow and training data. While our information flow analysis does not reveal any deficiencies that could be used to detect or fix the under-translation of negation, we find that negation is often rephrased during training, which could make it more difficult for the model to learn a reliable link between source and target negation. We finally conduct intrinsic analysis and extrinsic probing tasks on negation, showing that NMT models can distinguish negation and non-negation tokens very well and encode a lot of information about negation in hidden states but nevertheless leave room for improvement. △ Less

Submitted 26 July, 2021; originally announced July 2021.

Comments: To appear at TACL and to be presented at ACL 2021. Authors' final version

arXiv:2107.10428 [pdf, other]

Deep Adaptive Arbitrary Polynomial Chaos Expansion: A Mini-data-driven Semi-supervised Method for Uncertainty Quantification

Authors: Wen Yao, Xiaohu Zheng, Jun Zhang, Ning Wang, Guijian Tang

Abstract: The surrogate model-based uncertainty quantification method has drawn much attention in many engineering fields. Polynomial chaos expansion (PCE) and deep learning (DL) are powerful methods for building a surrogate model. However, PCE needs to increase the expansion order to improve the accuracy of the surrogate model, which causes more labeled data to solve the expansion coefficients, and DL also… ▽ More The surrogate model-based uncertainty quantification method has drawn much attention in many engineering fields. Polynomial chaos expansion (PCE) and deep learning (DL) are powerful methods for building a surrogate model. However, PCE needs to increase the expansion order to improve the accuracy of the surrogate model, which causes more labeled data to solve the expansion coefficients, and DL also requires a lot of labeled data to train the deep neural network (DNN). First of all, this paper proposes the adaptive arbitrary polynomial chaos (aPC) and proves two properties about the adaptive expansion coefficients. Based on the adaptive aPC, a semi-supervised deep adaptive arbitrary polynomial chaos expansion (Deep aPCE) method is proposed to reduce the training data cost and improve the surrogate model accuracy. For one hand, the Deep aPCE method uses two properties of the adaptive aPC to assist in training the DNN based on only a small amount of labeled data and many unlabeled data, significantly reducing the training data cost. On the other hand, the Deep aPCE method adopts the DNN to fine-tune the adaptive expansion coefficients dynamically, improving the Deep aPCE model accuracy with lower expansion order. Besides, the Deep aPCE method can directly construct accurate surrogate models of the high dimensional stochastic systems without complex dimension-reduction and model decomposition operations. Five numerical examples and an actual engineering problem are used to verify the effectiveness of the Deep aPCE method. △ Less

Submitted 1 March, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2106.12940 [pdf, other]

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Authors: Guozhi Tang, Lele Xie, Lianwen **, Jiapeng Wang, **gdong Chen, Zhen Xu, Qianying Wang, Yaqiang Wu, Hui Li

Abstract: Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification problem, which requires models to carefully identify each kind of semantics by introducing multimodal features, such as font, color, layout. But simply intr… ▽ More Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification problem, which requires models to carefully identify each kind of semantics by introducing multimodal features, such as font, color, layout. But simply introducing multimodal features couldn't work well when faced with numeric semantic categories or some ambiguous texts. To address this issue, in this paper we propose a novel key-value matching model based on a graph neural network for VIE (MatchVIE). Through key-value matching based on relevancy evaluation, the proposed MatchVIE can bypass the recognitions to various semantics, and simply focuses on the strong relevancy between entities. Besides, we introduce a simple but effective operation, Num2Vec, to tackle the instability of encoded values, which helps model converge more smoothly. Comprehensive experiments demonstrate that the proposed MatchVIE can significantly outperform previous methods. Notably, to the best of our knowledge, MatchVIE may be the first attempt to tackle the VIE task by modeling the relevancy between keys and values and it is a good complement to the existing methods. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: accepted by IJCAI 2021

arXiv:2106.10681 [pdf, other]

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

Authors: Jiapeng Wang, Tianwei Wang, Guozhi Tang, Lianwen **, Weihong Ma, Kai Ding, Yichao Huang

Abstract: Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will a… ▽ More Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supervision to train a sequence tagging model. However, it expends great annotation costs and may be exposed to label confusion, and the OCR errors will also significantly affect the final performance. In this paper, we propose a unified weakly-supervised learning framework called TCPN (Tag, Copy or Predict Network), which introduces 1) an efficient encoder to simultaneously model the semantic and layout information in 2D OCR results; 2) a weakly-supervised training strategy that utilizes only key information sequences as supervision; and 3) a flexible and switchable decoder which contains two inference modes: one (Copy or Predict Mode) is to output key information sequences of different categories by copying a token from the input or predicting one in each time step, and the other (Tag Mode) is to directly tag the input sequence in a single forward pass. Our method shows new state-of-the-art performance on several public benchmarks, which fully proves its effectiveness. △ Less

Submitted 20 June, 2021; originally announced June 2021.

Comments: IJCAI2021

arXiv:2106.07751 [pdf, other]

FedNILM: Applying Federated Learning to NILM Applications at the Edge

Authors: Yu Zhang, Guoming Tang, Qianyi Huang, Yi Wang, Xudong Wang, Jiadong Lou

Abstract: Non-intrusive load monitoring (NILM) helps disaggregate the household's main electricity consumption to energy usages of individual appliances, thus greatly cutting down the cost in fine-grained household load monitoring. To address the arisen privacy concern in NILM applications, federated learning (FL) could be leveraged for NILM model training and sharing. When applying the FL paradigm in real-… ▽ More Non-intrusive load monitoring (NILM) helps disaggregate the household's main electricity consumption to energy usages of individual appliances, thus greatly cutting down the cost in fine-grained household load monitoring. To address the arisen privacy concern in NILM applications, federated learning (FL) could be leveraged for NILM model training and sharing. When applying the FL paradigm in real-world NILM applications, however, we are faced with the challenges of edge resource restriction, edge model personalization and edge training data scarcity. In this paper we present FedNILM, a practical FL paradigm for NILM applications at the edge client. Specifically, FedNILM is designed to deliver privacy-preserving and personalized NILM services to large-scale edge clients, by leveraging i) secure data aggregation through federated learning, ii) efficient cloud model compression via filter pruning and multi-task learning, and iii) personalized edge model building with unsupervised transfer learning. Our experiments on real-world energy data show that, FedNILM is able to achieve personalized energy disaggregation with the state-of-the-art accuracy, while ensuring privacy preserving at the edge client. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: 9 pages, 5 figures, 3 tables

arXiv:2106.06574 [pdf, other]

doi 10.1109/TSP.2022.3181333

Landscape Correspondence of Empirical and Population Risks in the Eigendecomposition Problem

Authors: Shuang Li, Gongguo Tang, Michael B. Wakin

Abstract: Spectral methods include a family of algorithms related to the eigenvectors of certain data-generated matrices. In this work, we are interested in studying the geometric landscape of the eigendecomposition problem in various spectral methods. In particular, we first extend known results regarding the landscape at critical points to larger regions near the critical points in a special case of findi… ▽ More Spectral methods include a family of algorithms related to the eigenvectors of certain data-generated matrices. In this work, we are interested in studying the geometric landscape of the eigendecomposition problem in various spectral methods. In particular, we first extend known results regarding the landscape at critical points to larger regions near the critical points in a special case of finding the leading eigenvector of a symmetric matrix. For a more general eigendecomposition problem, inspired by recent findings on the connection between the landscapes of empirical risk and population risk, we then build a novel connection between the landscape of an eigendecomposition problem that uses random measurements and the one that uses the true data matrix. We also apply our theory to a variety of low-rank matrix optimization problems and conduct a series of simulations to illustrate our theoretical findings. △ Less

Submitted 27 June, 2022; v1 submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.00297 [pdf, other]

More Behind Your Electricity Bill: a Dual-DNN Approach to Non-Intrusive Load Monitoring

Authors: Yu Zhang, Guoming Tang, Qianyi Huang, Yi Wang, Hong Xu

Abstract: Non-intrusive load monitoring (NILM) is a well-known single-channel blind source separation problem that aims to decompose the household energy consumption into itemised energy usage of individual appliances. In this way, considerable energy savings could be achieved by enhancing household's awareness of energy usage. Recent investigations have shown that deep neural networks (DNNs) based approach… ▽ More Non-intrusive load monitoring (NILM) is a well-known single-channel blind source separation problem that aims to decompose the household energy consumption into itemised energy usage of individual appliances. In this way, considerable energy savings could be achieved by enhancing household's awareness of energy usage. Recent investigations have shown that deep neural networks (DNNs) based approaches are promising for the NILM task. Nevertheless, they normally ignore the inherent properties of appliance operations in the network design, potentially leading to implausible results. We are thus motivated to develop the dual Deep Neural Networks (dual-DNN), which aims to i) take advantage of DNNs' learning capability of latent features and ii) empower the DNN architecture with identification ability of universal properties. Specifically in the design of dual-DNN, we adopt one subnetwork to measure power ratings of different appliances' operation states, and the other subnetwork to identify the running states of target appliances. The final result is then obtained by multiplying these two network outputs and meanwhile considering the multi-state property of household appliances. To enforce the sparsity property in appliance's state operating, we employ median filtering and hard gating mechanisms to the subnetwork for state identification. Compared with the state-of-the-art NILM methods, our dual-DNN approach demonstrates a 21.67% performance improvement in average on two public benchmark datasets. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 9 pages, 6 figures, 3 tables

arXiv:2105.00239 [pdf, other]

MRCBert: A Machine Reading ComprehensionApproach for Unsupervised Summarization

Authors: Saurabh Jain, Guokai Tang, Lim Sze Chi

Abstract: When making an online purchase, it becomes important for the customer to read the product reviews carefully and make a decision based on that. However, reviews can be lengthy, may contain repeated, or sometimes irrelevant information that does not help in decision making. In this paper, we introduce MRCBert, a novel unsupervised method to generate summaries from product reviews. We leverage Machin… ▽ More When making an online purchase, it becomes important for the customer to read the product reviews carefully and make a decision based on that. However, reviews can be lengthy, may contain repeated, or sometimes irrelevant information that does not help in decision making. In this paper, we introduce MRCBert, a novel unsupervised method to generate summaries from product reviews. We leverage Machine Reading Comprehension, i.e. MRC, approach to extract relevant opinions and generate both rating-wise and aspect-wise summaries from reviews. Through MRCBert we show that we can obtain reasonable performance using existing models and transfer learning, which can be useful for learning under limited or low resource scenarios. We demonstrated our results on reviews of a product from the Electronics category in the Amazon Reviews dataset. Our approach is unsupervised as it does not require any domain-specific dataset, such as the product review dataset, for training or fine-tuning. Instead, we have used SQuAD v1.1 dataset only to fine-tune BERT for the MRC task. Since MRCBert does not require a task-specific dataset, it can be easily adapted and used in other domains. △ Less

Submitted 1 May, 2021; originally announced May 2021.

arXiv:2104.14978 [pdf, other]

doi 10.1109/TASE49443.2020.00010

A comparative study of neural network techniques for automatic software vulnerability detection

Authors: Gaigai Tang, Lianxiao Meng, Shuangyin Ren, Weipeng Cao, Qiang Wang, Lin Yang

Abstract: Software vulnerabilities are usually caused by design flaws or implementation errors, which could be exploited to cause damage to the security of the system. At present, the most commonly used method for detecting software vulnerabilities is static analysis. Most of the related technologies work based on rules or code similarity (source code level) and rely on manually defined vulnerability featur… ▽ More Software vulnerabilities are usually caused by design flaws or implementation errors, which could be exploited to cause damage to the security of the system. At present, the most commonly used method for detecting software vulnerabilities is static analysis. Most of the related technologies work based on rules or code similarity (source code level) and rely on manually defined vulnerability features. However, these rules and vulnerability features are difficult to be defined and designed accurately, which makes static analysis face many challenges in practical applications. To alleviate this problem, some researchers have proposed to use neural networks that have the ability of automatic feature extraction to improve the intelligence of detection. However, there are many types of neural networks, and different data preprocessing methods will have a significant impact on model performance. It is a great challenge for engineers and researchers to choose a proper neural network and data preprocessing method for a given problem. To solve this problem, we have conducted extensive experiments to test the performance of the two most typical neural networks (i.e., Bi-LSTM and RVFL) with the two most classical data preprocessing methods (i.e., the vector representation and the program symbolization methods) on software vulnerability detection problems and obtained a series of interesting research conclusions, which can provide valuable guidelines for researchers and engineers. Specifically, we found that 1) the training speed of RVFL is always faster than BiLSTM, but the prediction accuracy of Bi-LSTM model is higher than RVFL; 2) using doc2vec for vector representation can make the model have faster training speed and generalization ability than using word2vec; and 3) multi-level symbolization is helpful to improve the precision of neural network models. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: This paper has been published at April 28,2021. However, there are some experimental data issues in the published manuscript, which are caused by the calculation error of indicators. This paper is a revised version

Showing 1–50 of 137 results for author: Tang, G