-
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Authors:
Sukmin Yun,
Haokun Lin,
Rusiru Thushara,
Mohammad Qazim Bhat,
Yongxin Wang,
Zutao Jiang,
Mingkai Deng,
**hong Wang,
Tianhua Tao,
Junbo Li,
Haonan Li,
Preslav Nakov,
Timothy Baldwin,
Zhengzhong Liu,
Eric P. Xing,
Xiaodan Liang,
Zhiqiang Shen
Abstract:
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t…
▽ More
Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the webpage content in the responses to enable a more comprehensive understanding of the web content. To evaluate model performance in these tasks, we develop an evaluation framework for testing MLLMs' abilities in webpage understanding and web-to-code generation. Extensive experiments show that our proposed dataset is beneficial not only to our proposed tasks but also in the general visual domain, while previous datasets result in worse performance. We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation. Our data and code will be available at https://github.com/MBZUAI-LLM/web2code.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Autoregressive Image Generation without Vector Quantization
Authors:
Tianhong Li,
Yonglong Tian,
He Li,
Mingyang Deng,
Kaiming He
Abstract:
Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us t…
▽ More
Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us to apply autoregressive models in a continuous-valued space. Rather than using categorical cross-entropy loss, we define a Diffusion Loss function to model the per-token probability. This approach eliminates the need for discrete-valued tokenizers. We evaluate its effectiveness across a wide range of cases, including standard autoregressive models and generalized masked autoregressive (MAR) variants. By removing vector quantization, our image generator achieves strong results while enjoying the speed advantage of sequence modeling. We hope this work will motivate the use of autoregressive generation in other continuous-valued domains and applications.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Eulerian-Lagrangian Fluid Simulation on Particle Flow Maps
Authors:
Junwei Zhou,
Duowen Chen,
Molin Deng,
Yitong Deng,
Yuchen Sun,
Sinan Wang,
Shiying Xiong,
Bo Zhu
Abstract:
We propose a novel Particle Flow Map (PFM) method to enable accurate long-range advection for incompressible fluid simulation. The foundation of our method is the observation that a particle trajectory generated in a forward simulation naturally embodies a perfect flow map. Centered on this concept, we have developed an Eulerian-Lagrangian framework comprising four essential components: Lagrangian…
▽ More
We propose a novel Particle Flow Map (PFM) method to enable accurate long-range advection for incompressible fluid simulation. The foundation of our method is the observation that a particle trajectory generated in a forward simulation naturally embodies a perfect flow map. Centered on this concept, we have developed an Eulerian-Lagrangian framework comprising four essential components: Lagrangian particles for a natural and precise representation of bidirectional flow maps; a dual-scale map representation to accommodate the map** of various flow quantities; a particle-to-grid interpolation scheme for accurate quantity transfer from particles to grid nodes; and a hybrid impulse-based solver to enforce incompressibility on the grid. The efficacy of PFM has been demonstrated through various simulation scenarios, highlighting the evolution of complex vortical structures and the details of turbulent flows. Notably, compared to NFM, PFM reduces computing time by up to 49 times and memory consumption by up to 41%, while enhancing vorticity preservation as evidenced in various tests like leapfrog, vortex tube, and turbulent flow.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Measuring Feature Sparsity in Language Models
Authors:
Mingyang Deng,
Lucas Tao,
Joe Benton
Abstract:
Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We…
▽ More
Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.
△ Less
Submitted 13 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Experimental quantum natural gradient optimization in photonics
Authors:
Yizhi Wang,
Shichuan Xue,
Yaxuan Wang,
Jiangfang Ding,
Weixu Shi,
Dongyang Wang,
Yong Liu,
Yingwen Liu,
Xiang Fu,
Guangyao Huang,
Anqi Huang,
Mingtang Deng,
Junjie Wu
Abstract:
Variational quantum algorithms (VQAs) combining the advantages of parameterized quantum circuits and classical optimizers, promise practical quantum applications in the Noisy Intermediate-Scale Quantum era. The performance of VQAs heavily depends on the optimization method. Compared with gradient-free and ordinary gradient descent methods, the quantum natural gradient (QNG), which mirrors the geom…
▽ More
Variational quantum algorithms (VQAs) combining the advantages of parameterized quantum circuits and classical optimizers, promise practical quantum applications in the Noisy Intermediate-Scale Quantum era. The performance of VQAs heavily depends on the optimization method. Compared with gradient-free and ordinary gradient descent methods, the quantum natural gradient (QNG), which mirrors the geometric structure of the parameter space, can achieve faster convergence and avoid local minima more easily, thereby reducing the cost of circuit executions. We utilized a fully programmable photonic chip to experimentally estimate the QNG in photonics for the first time. We obtained the dissociation curve of the He-H$^+$ cation and achieved chemical accuracy, verifying the outperformance of QNG optimization on a photonic device. Our work opens up a vista of utilizing QNG in photonics to implement practical near-term quantum applications.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Quantum generative adversarial learning in photonics
Authors:
Yizhi Wang,
Shichuan Xue,
Yaxuan Wang,
Yong Liu,
Jiangfang Ding,
Weixu Shi,
Dongyang Wang,
Yingwen Liu,
Xiang Fu,
Guangyao Huang,
Anqi Huang,
Mingtang Deng,
Junjie Wu
Abstract:
Quantum Generative Adversarial Networks (QGANs), an intersection of quantum computing and machine learning, have attracted widespread attention due to their potential advantages over classical analogs. However, in the current era of Noisy Intermediate-Scale Quantum (NISQ) computing, it is essential to investigate whether QGANs can perform learning tasks on near-term quantum devices usually affecte…
▽ More
Quantum Generative Adversarial Networks (QGANs), an intersection of quantum computing and machine learning, have attracted widespread attention due to their potential advantages over classical analogs. However, in the current era of Noisy Intermediate-Scale Quantum (NISQ) computing, it is essential to investigate whether QGANs can perform learning tasks on near-term quantum devices usually affected by noise and even defects. In this Letter, using a programmable silicon quantum photonic chip, we experimentally demonstrate the QGAN model in photonics for the first time, and investigate the effects of noise and defects on its performance. Our results show that QGANs can generate high-quality quantum data with a fidelity higher than 90\%, even under conditions where up to half of the generator's phase shifters are damaged, or all of the generator and discriminator's phase shifters are subjected to phase noise up to 0.04$π$. Our work sheds light on the feasibility of implementing QGANs on NISQ-era quantum hardware.
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
Seeing Is Not Always Believing: Invisible Collision Attack and Defence on Pre-Trained Models
Authors:
Minghang Deng,
Zhong Zhang,
Junming Shao
Abstract:
Large-scale pre-trained models (PTMs) such as BERT and GPT have achieved great success in diverse fields. The typical paradigm is to pre-train a big deep learning model on large-scale data sets, and then fine-tune the model on small task-specific data sets for downstream tasks. Although PTMs have rapidly progressed with wide real-world applications, they also pose significant risks of potential at…
▽ More
Large-scale pre-trained models (PTMs) such as BERT and GPT have achieved great success in diverse fields. The typical paradigm is to pre-train a big deep learning model on large-scale data sets, and then fine-tune the model on small task-specific data sets for downstream tasks. Although PTMs have rapidly progressed with wide real-world applications, they also pose significant risks of potential attacks. Existing backdoor attacks or data poisoning methods often build up the assumption that the attacker invades the computers of victims or accesses the target data, which is challenging in real-world scenarios. In this paper, we propose a novel framework for an invisible attack on PTMs with enhanced MD5 collision. The key idea is to generate two equal-size models with the same MD5 checksum by leveraging the MD5 chosen-prefix collision. Afterwards, the two ``same" models will be deployed on public websites to induce victims to download the poisoned model. Unlike conventional attacks on deep learning models, this new attack is flexible, covert, and model-independent. Additionally, we propose a simple defensive strategy for recognizing the MD5 chosen-prefix collision and provide a theoretical justification for its feasibility. We extensively validate the effectiveness and stealthiness of our proposed attack and defensive method on different models and data sets.
△ Less
Submitted 7 May, 2024; v1 submitted 24 September, 2023;
originally announced September 2023.
-
Research on Joint Representation Learning Methods for Entity Neighborhood Information and Description Information
Authors:
Le Xiao,
Xin Shan,
Yuhua Wang,
Miaolei Deng
Abstract:
To address the issue of poor embedding performance in the knowledge graph of a programming design course, a joint represen-tation learning model that combines entity neighborhood infor-mation and description information is proposed. Firstly, a graph at-tention network is employed to obtain the features of entity neigh-boring nodes, incorporating relationship features to enrich the structural infor…
▽ More
To address the issue of poor embedding performance in the knowledge graph of a programming design course, a joint represen-tation learning model that combines entity neighborhood infor-mation and description information is proposed. Firstly, a graph at-tention network is employed to obtain the features of entity neigh-boring nodes, incorporating relationship features to enrich the structural information. Next, the BERT-WWM model is utilized in conjunction with attention mechanisms to obtain the representation of entity description information. Finally, the final entity vector representation is obtained by combining the vector representations of entity neighborhood information and description information. Experimental results demonstrate that the proposed model achieves favorable performance on the knowledge graph dataset of the pro-gramming design course, outperforming other baseline models.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Restart Sampling for Improving Generative Processes
Authors:
Yilun Xu,
Mingyang Deng,
Xiang Cheng,
Yonglong Tian,
Ziming Liu,
Tommi Jaakkola
Abstract:
Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while…
▽ More
Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction. The sampling method alternates between adding substantial noise in additional forward steps and strictly following a backward ODE. Empirically, Restart sampler surpasses previous SDE and ODE samplers in both speed and accuracy. Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$. In addition, it attains significantly better sample quality than ODE samplers within comparable sampling times. Moreover, Restart better balances text-image alignment/visual quality versus diversity than previous samplers in the large-scale text-to-image Stable Diffusion model pre-trained on LAION $512 \times 512$. Code is available at https://github.com/Newbeeer/diffusion_restart_sampling
△ Less
Submitted 1 November, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
TGNN: A Joint Semi-supervised Framework for Graph-level Classification
Authors:
Wei Ju,
Xiao Luo,
Meng Qu,
Yifan Wang,
Chong Chen,
Minghua Deng,
Xian-Sheng Hua,
Ming Zhang
Abstract:
This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are…
▽ More
This paper studies semi-supervised graph classification, a crucial task with a wide range of applications in social network analysis and bioinformatics. Recent works typically adopt graph neural networks to learn graph-level representations for classification, failing to explicitly leverage features derived from graph topology (e.g., paths). Moreover, when labeled data is scarce, these methods are far from satisfactory due to their insufficient topology exploration of unlabeled data. We address the challenge by proposing a novel semi-supervised framework called Twin Graph Neural Network (TGNN). To explore graph structural information from complementary views, our TGNN has a message passing module and a graph kernel module. To fully utilize unlabeled data, for each module, we calculate the similarity of each unlabeled graph to other labeled graphs in the memory bank and our consistency loss encourages consistency between two similarity distributions in different embedding spaces. The two twin modules collaborate with each other by exchanging instance similarity knowledge to fully explore the structure information of both labeled and unlabeled data. We evaluate our TGNN on various public datasets and show that it achieves strong performance.
△ Less
Submitted 23 April, 2023;
originally announced April 2023.
-
Digital Privacy Under Attack: Challenges and Enablers
Authors:
Baobao Song,
Mengyue Deng,
Shiva Raj Pokhrel,
Qiujun Lan,
Robin Doss,
Gang Li
Abstract:
Users have renewed interest in protecting their private data in the digital space. When they don't believe that their privacy is sufficiently covered by one platform, they will readily switch to another. Such an increasing level of privacy awareness has made privacy preservation an essential research topic. Nevertheless, new privacy attacks are emerging day by day. Therefore, a holistic survey to…
▽ More
Users have renewed interest in protecting their private data in the digital space. When they don't believe that their privacy is sufficiently covered by one platform, they will readily switch to another. Such an increasing level of privacy awareness has made privacy preservation an essential research topic. Nevertheless, new privacy attacks are emerging day by day. Therefore, a holistic survey to compare the discovered techniques on attacks over privacy preservation and their mitigation schemes is essential in the literature. We develop a study to fill this gap by assessing the resilience of privacy-preserving methods to various attacks and conducting a comprehensive review of countermeasures from a broader perspective. First, we introduce the fundamental concepts and critical components of privacy attacks. Second, we comprehensively cover major privacy attacks targeted at anonymous data, statistical aggregate data, and privacy-preserving models. We also summarize popular countermeasures to mitigate these attacks. Finally, some promising future research directions and related issues in the privacy community are envisaged. We believe this survey will successfully shed some light on privacy research and encourage researchers to entirely understand the resilience of different existing privacy-preserving approaches.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Approximating Knapsack and Partition via Dense Subset Sums
Authors:
Mingyang Deng,
Ce **,
Xiao Mao
Abstract:
Knapsack and Partition are two important additive problems whose fine-grained complexities in the $(1-\varepsilon)$-approximation setting are not yet settled. In this work, we make progress on both problems by giving improved algorithms.
- Knapsack can be $(1 - \varepsilon)$-approximated in $\tilde O(n + (1/\varepsilon) ^ {2.2} )$ time, improving the previous…
▽ More
Knapsack and Partition are two important additive problems whose fine-grained complexities in the $(1-\varepsilon)$-approximation setting are not yet settled. In this work, we make progress on both problems by giving improved algorithms.
- Knapsack can be $(1 - \varepsilon)$-approximated in $\tilde O(n + (1/\varepsilon) ^ {2.2} )$ time, improving the previous $\tilde O(n + (1/\varepsilon) ^ {2.25} )$ by ** (ICALP'19). There is a known conditional lower bound of $(n+\varepsilon)^{2-o(1)}$ based on $(\min,+)$-convolution hypothesis.
- Partition can be $(1 - \varepsilon)$-approximated in $\tilde O(n + (1/\varepsilon) ^ {1.25} )$ time, improving the previous $\tilde O(n + (1/\varepsilon) ^ {1.5} )$ by Bringmann and Nakos (SODA'21). There is a known conditional lower bound of $(1/\varepsilon)^{1-o(1)}$ based on Strong Exponential Time Hypothesis.
Both of our new algorithms apply the additive combinatorial results on dense subset sums by Galil and Margalit (SICOMP'91), Bringmann and Wellnitz (SODA'21). Such techniques have not been explored in the context of Knapsack prior to our work. In addition, we design several new methods to speed up the divide-and-conquer steps which naturally arise in solving additive problems.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
Authors:
Hua Wei,
**gxiao Chen,
Xiyang Ji,
Hongyang Qin,
Minwen Deng,
Siqin Li,
Liang Wang,
Weinan Zhang,
Yong Yu,
Lin Liu,
Lanxiao Huang,
Deheng Ye,
Qiang Fu,
Wei Yang
Abstract:
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires th…
▽ More
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .
△ Less
Submitted 18 October, 2022; v1 submitted 18 September, 2022;
originally announced September 2022.
-
Large-scale full-programmable quantum walk and its applications
Authors:
Yizhi Wang,
Yingwen Liu,
Junwei Zhan,
Shichuan Xue,
Yuzhen Zheng,
Ru Zeng,
Zhihao Wu,
Zihao Wang,
Qilin Zheng,
Dongyang Wang,
Weixu Shi,
Xiang Fu,
** Xu,
Yang Wang,
Yong Liu,
Jiangfang Ding,
Guangyao Huang,
Chunlin Yu,
Anqi Huang,
Xiaogang Qiang,
Mingtang Deng,
Weixia Xu,
Kai Lu,
Xuejun Yang,
Junjie Wu
Abstract:
With photonics, the quantum computational advantage has been demonstrated on the task of boson sampling. Next, develo** quantum-enhanced approaches for practical problems becomes one of the top priorities for photonic systems. Quantum walks are powerful kernels for develo** new and useful quantum algorithms. Here we realize large-scale quantum walks using a fully programmable photonic quantum…
▽ More
With photonics, the quantum computational advantage has been demonstrated on the task of boson sampling. Next, develo** quantum-enhanced approaches for practical problems becomes one of the top priorities for photonic systems. Quantum walks are powerful kernels for develo** new and useful quantum algorithms. Here we realize large-scale quantum walks using a fully programmable photonic quantum computing system. The system integrates a silicon quantum photonic chip, enabling the simulation of quantum walk dynamics on graphs with up to 400 vertices and possessing full programmability over quantum walk parameters, including the particle property, initial state, graph structure, and evolution time. In the 400-dimensional Hilbert space, the average fidelity of random entangled quantum states after the whole on-chip circuit evolution reaches as high as 94.29$\pm$1.28$\%$. With the system, we demonstrated exponentially faster hitting and quadratically faster mixing performance of quantum walks over classical random walks, achieving more than two orders of magnitude of enhancement in the experimental hitting efficiency and almost half of the reduction in the experimental evolution time for mixing. We utilize the system to implement a series of quantum applications, including measuring the centrality of scale-free networks, searching targets on Erdös-Rényi networks, distinguishing non-isomorphic graph pairs, and simulating the topological phase of higher-order topological insulators. Our work shows one feasible path for quantum photonics to address applications of practical interests in the near future.
△ Less
Submitted 28 August, 2022;
originally announced August 2022.
-
Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory
Authors:
**gru Zhu,
Ya Guo,
Geng Sun,
Libo Yang,
Min Deng,
Jie Chen
Abstract:
Semantic segmentation is a key technique involved in automatic interpretation of high-resolution remote sensing (HRS) imagery and has drawn much attention in the remote sensing community. Deep convolutional neural networks (DCNNs) have been successfully applied to the HRS imagery semantic segmentation task due to their hierarchical representation ability. However, the heavy dependency on a large n…
▽ More
Semantic segmentation is a key technique involved in automatic interpretation of high-resolution remote sensing (HRS) imagery and has drawn much attention in the remote sensing community. Deep convolutional neural networks (DCNNs) have been successfully applied to the HRS imagery semantic segmentation task due to their hierarchical representation ability. However, the heavy dependency on a large number of training data with dense annotation and the sensitiveness to the variation of data distribution severely restrict the potential application of DCNNs for the semantic segmentation of HRS imagery. This study proposes a novel unsupervised domain adaptation semantic segmentation network (MemoryAdaptNet) for the semantic segmentation of HRS imagery. MemoryAdaptNet constructs an output space adversarial learning scheme to bridge the domain distribution discrepancy between source domain and target domain and to narrow the influence of domain shift. Specifically, we embed an invariant feature memory module to store invariant domain-level context information because the features obtained from adversarial learning only tend to represent the variant feature of current limited inputs. This module is integrated by a category attention-driven invariant domain-level context aggregation module to current pseudo invariant feature for further augmenting the pixel representations. An entropy-based pseudo label filtering strategy is used to update the memory module with high-confident pseudo invariant feature of current target images. Extensive experiments under three cross-domain tasks indicate that our proposed MemoryAdaptNet is remarkably superior to the state-of-the-art methods.
△ Less
Submitted 14 February, 2023; v1 submitted 16 August, 2022;
originally announced August 2022.
-
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning
Authors:
Mingkai Deng,
Jianyu Wang,
Cheng-** Hsieh,
Yihan Wang,
Han Guo,
Tianmin Shu,
Meng Song,
Eric P. Xing,
Zhiting Hu
Abstract:
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicab…
▽ More
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicability when gradients are not accessible. Discrete prompt, on the other hand, is difficult to optimize, and is often created by "enumeration (e.g., paraphrasing)-then-selection" heuristics that do not explore the prompt space systematically. This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL). RLPrompt formulates a parameter-efficient policy network that generates the desired discrete prompt after training with reward. To overcome the complexity and stochasticity of reward signals by the large LM environment, we incorporate effective reward stabilization that substantially enhances the training efficiency. RLPrompt is flexibly applicable to different types of LMs, such as masked (e.g., BERT) and left-to-right models (e.g., GPTs), for both classification and generation tasks. Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods. Interestingly, the resulting optimized prompts are often ungrammatical gibberish text; and surprisingly, those gibberish prompts are transferrable between different LMs to retain significant performance, indicating LM prompting may not follow human language patterns.
△ Less
Submitted 22 October, 2022; v1 submitted 25 May, 2022;
originally announced May 2022.
-
On Problems Related to Unbounded SubsetSum: A Unified Combinatorial Approach
Authors:
Mingyang Deng,
Xiao Mao,
Ziqian Zhong
Abstract:
Unbounded SubsetSum is a classical textbook problem: given integers $w_1,w_2,\cdots,w_n\in [1,u],~c,u$, we need to find if there exists $m_1,m_2,\cdots,m_n\in \mathbb{N}$ satisfying $c=\sum_{i=1}^n w_im_i$. In its all-target version, $t\in \mathbb{Z}_+$ is given and answer for all integers $c\in[0,t]$ is required. In this paper, we study three generalizations of this simple problem: All-Target Unb…
▽ More
Unbounded SubsetSum is a classical textbook problem: given integers $w_1,w_2,\cdots,w_n\in [1,u],~c,u$, we need to find if there exists $m_1,m_2,\cdots,m_n\in \mathbb{N}$ satisfying $c=\sum_{i=1}^n w_im_i$. In its all-target version, $t\in \mathbb{Z}_+$ is given and answer for all integers $c\in[0,t]$ is required. In this paper, we study three generalizations of this simple problem: All-Target Unbounded Knapsack, All-Target CoinChange and Residue Table. By new combinatorial insights into the structures of solutions, we present a novel two-phase approach for such problems. As a result, we present the first near-linear algorithms for CoinChange and Residue Table, which runs in $\tilde{O}(u+t)$ and $\tilde{O}(u)$ time deterministically. We also show if we can compute $(\min,+)$ convolution for $n$-length arrays in $T(n)$ time, then All-Target Unbounded Knapsack can be solved in $\tilde{O}(T(u)+t)$ time, thus establishing sub-quadratic equivalence between All-Target Unbounded Knapsack and $(\min,+)$ convolution.
△ Less
Submitted 27 February, 2022;
originally announced February 2022.
-
Learning by Active Forgetting for Neural Networks
Authors:
Jian Peng,
Xian Sun,
Min Deng,
Chao Tao,
Bo Tang,
Wenbo Li,
Guohua Wu,
QingZhu,
Yu Liu,
Tao Lin,
Haifeng Li
Abstract:
Remembering and forgetting mechanisms are two sides of the same coin in a human learning-memory system. Inspired by human brain memory mechanisms, modern machine learning systems have been working to endow machine with lifelong learning capability through better remembering while pushing the forgetting as the antagonist to overcome. Nevertheless, this idea might only see the half picture. Up until…
▽ More
Remembering and forgetting mechanisms are two sides of the same coin in a human learning-memory system. Inspired by human brain memory mechanisms, modern machine learning systems have been working to endow machine with lifelong learning capability through better remembering while pushing the forgetting as the antagonist to overcome. Nevertheless, this idea might only see the half picture. Up until very recently, increasing researchers argue that a brain is born to forget, i.e., forgetting is a natural and active process for abstract, rich, and flexible representations. This paper presents a learning model by active forgetting mechanism with artificial neural networks. The active forgetting mechanism (AFM) is introduced to a neural network via a "plug-and-play" forgetting layer (P\&PF), consisting of groups of inhibitory neurons with Internal Regulation Strategy (IRS) to adjust the extinction rate of themselves via lateral inhibition mechanism and External Regulation Strategy (ERS) to adjust the extinction rate of excitatory neurons via inhibition mechanism. Experimental studies have shown that the P\&PF offers surprising benefits: self-adaptive structure, strong generalization, long-term learning and memory, and robustness to data and parameter perturbation. This work sheds light on the importance of forgetting in the learning process and offers new perspectives to understand the underlying mechanisms of neural networks.
△ Less
Submitted 21 November, 2021;
originally announced November 2021.
-
Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Authors:
Mingkai Deng,
Bowen Tan,
Zhengzhong Liu,
Eric P. Xing,
Zhiting Hu
Abstract:
Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying…
▽ More
Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying perspective that facilitates the design of metrics for a wide range of language generation tasks and quality aspects. Based on the nature of information change from input to output, we classify NLG tasks into compression (e.g., summarization), transduction (e.g., text rewriting), and creation (e.g., dialog). The information alignment, or overlap, between input, context, and output text plays a common central role in characterizing the generation. Using the uniform concept of information alignment, we develop a family of interpretable metrics for various NLG tasks and aspects, often without need of gold reference data. To operationalize the metrics, we train self-supervised models to approximate information alignment as a prediction task. Experiments show the uniformly designed metrics achieve stronger or comparable correlations with human judgement compared to state-of-the-art metrics in each of diverse tasks, including text summarization, style transfer, and knowledge-grounded dialog. With information alignment as the intermediate representation, we deliver a composable library for easy NLG evaluation and future metric design.
△ Less
Submitted 21 January, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
ARGO: Modeling Heterogeneity in E-commerce Recommendation
Authors:
Daqing Wu,
Xiao Luo,
Zeyu Ma,
Chong Chen,
Minghua Deng,
**wen Ma
Abstract:
Nowadays, E-commerce is increasingly integrated into our daily lives. Meanwhile, shop** process has also changed incrementally from one behavior (purchase) to multiple behaviors (such as view, carting and purchase). Therefore, utilizing interaction data of auxiliary behavior data draws a lot of attention in the E-commerce recommender systems. However, all existing models ignore two kinds of intr…
▽ More
Nowadays, E-commerce is increasingly integrated into our daily lives. Meanwhile, shop** process has also changed incrementally from one behavior (purchase) to multiple behaviors (such as view, carting and purchase). Therefore, utilizing interaction data of auxiliary behavior data draws a lot of attention in the E-commerce recommender systems. However, all existing models ignore two kinds of intrinsic heterogeneity which are helpful to capture the difference of user preferences and the difference of item attributes. First (intra-heterogeneity), each user has multiple social identities with otherness, and these different identities can result in quite different interaction preferences. Second (inter-heterogeneity), each item can transfer an item-specific percentage of score from low-level behavior to high-level behavior for the gradual relationship among multiple behaviors. Thus, the lack of consideration of these heterogeneities damages recommendation rank performance. To model the above heterogeneities, we propose a novel method named intra- and inter-heterogeneity recommendation model (ARGO). Specifically, we embed each user into multiple vectors representing the user's identities, and the maximum of identity scores indicates the interaction preference. Besides, we regard the item-specific transition percentage as trainable transition probability between different behaviors. Extensive experiments on two real-world datasets show that ARGO performs much better than the state-of-the-art in multi-behavior scenarios.
△ Less
Submitted 14 September, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Neighborhood Consensus Contrastive Learning for Backward-Compatible Representation
Authors:
Shengsen Wu,
Liang Chen,
Yihang Lou,
Yan Bai,
Tao Bai,
Minghua Deng,
Lingyu Duan
Abstract:
In object re-identification (ReID), the development of deep learning techniques often involves model updates and deployment. It is unbearable to re-embedding and re-index with the system suspended when deploying new models. Therefore, backward-compatible representation is proposed to enable "new" features to be compared with "old" features directly, which means that the database is active when the…
▽ More
In object re-identification (ReID), the development of deep learning techniques often involves model updates and deployment. It is unbearable to re-embedding and re-index with the system suspended when deploying new models. Therefore, backward-compatible representation is proposed to enable "new" features to be compared with "old" features directly, which means that the database is active when there are both "new" and "old" features in it. Thus we can scroll-refresh the database or even do nothing on the database to update.
The existing backward-compatible methods either require a strong overlap between old and new training data or simply conduct constraints at the instance level. Thus they are difficult in handling complicated cluster structures and are limited in eliminating the impact of outliers in old embeddings, resulting in a risk of damaging the discriminative capability of new features. In this work, we propose a Neighborhood Consensus Contrastive Learning (NCCL) method. With no assumptions about the new training data, we estimate the sub-cluster structures of old embeddings. A new embedding is constrained with multiple old embeddings in both embedding space and discrimination space at the sub-class level. The effect of outliers diminished, as the multiple samples serve as "mean teachers". Besides, we also propose a scheme to filter the old embeddings with low credibility, further improving the compatibility robustness. Our method ensures backward compatibility without impairing the accuracy of the new model. And it can even improve the new model's accuracy in most scenarios.
△ Less
Submitted 8 March, 2023; v1 submitted 7 August, 2021;
originally announced August 2021.
-
DNA-GCN: Graph convolutional networks for predicting DNA-protein binding
Authors:
Yuhang Guo,
Xiao Luo,
Liang Chen,
Minghua Deng
Abstract:
Predicting DNA-protein binding is an important and classic problem in bioinformatics. Convolutional neural networks have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. However, none of the studies has utilized graph convolutional networks for motif inference. In this work, we propose to use graph convolutional networks for motif inference. We build a…
▽ More
Predicting DNA-protein binding is an important and classic problem in bioinformatics. Convolutional neural networks have outperformed conventional methods in modeling the sequence specificity of DNA-protein binding. However, none of the studies has utilized graph convolutional networks for motif inference. In this work, we propose to use graph convolutional networks for motif inference. We build a sequence k-mer graph for the whole dataset based on k-mer co-occurrence and k-mer sequence relationship and then learn DNA Graph Convolutional Network (DNA-GCN) for the whole dataset. Our DNA-GCN is initialized with a one-hot representation for all nodes, and it then jointly learns the embeddings for both k-mers and sequences, as supervised by the known labels of sequences. We evaluate our model on 50 datasets from ENCODE. DNA-GCN shows its competitive performance compared with the baseline model. Besides, we analyze our model and design several different architectures to help fit different datasets.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
Criterion-based Heterogeneous Collaborative Filtering for Multi-behavior Implicit Recommendation
Authors:
Xiao Luo,
Daqing Wu,
Yiyang Gu,
Chong Chen,
Luchen Liu,
**wen Ma,
Ming Zhang,
Minghua Deng,
Jianqiang Huang,
Xian-Sheng Hua
Abstract:
Recent years have witnessed the explosive growth of interaction behaviors in multimedia information systems, where multi-behavior recommender systems have received increasing attention by leveraging data from various auxiliary behaviors such as tip and collect. Among various multi-behavior recommendation methods, non-sampling methods have shown superiority over negative sampling methods. However,…
▽ More
Recent years have witnessed the explosive growth of interaction behaviors in multimedia information systems, where multi-behavior recommender systems have received increasing attention by leveraging data from various auxiliary behaviors such as tip and collect. Among various multi-behavior recommendation methods, non-sampling methods have shown superiority over negative sampling methods. However, two observations are usually ignored in existing state-of-the-art non-sampling methods based on binary regression: (1) users have different preference strengths for different items, so they cannot be measured simply by binary implicit data; (2) the dependency across multiple behaviors varies for different users and items. To tackle the above issue, we propose a novel non-sampling learning framework named Criterion-guided Heterogeneous Collaborative Filtering (CHCF). CHCF introduces both upper and lower thresholds to indicate selection criteria, which will guide user preference learning. Besides, CHCF integrates criterion learning and user preference learning into a unified framework, which can be trained jointly for the interaction prediction of the target behavior. We further theoretically demonstrate that the optimization of Collaborative Metric Learning can be approximately achieved by the CHCF learning framework in a non-sampling form effectively. Extensive experiments on three real-world datasets show the effectiveness of CHCF in heterogeneous scenarios.
△ Less
Submitted 25 July, 2023; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Deep Unsupervised Hashing by Distilled Smooth Guidance
Authors:
Xiao Luo,
Zeyu Ma,
Daqing Wu,
Huasong Zhong,
Chong Chen,
**wen Ma,
Minghua Deng
Abstract:
Hashing has been widely used in approximate nearest neighbor search for its storage and computational efficiency. Deep supervised hashing methods are not widely used because of the lack of labeled data, especially when the domain is transferred. Meanwhile, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable similarity signals. To tackle this pro…
▽ More
Hashing has been widely used in approximate nearest neighbor search for its storage and computational efficiency. Deep supervised hashing methods are not widely used because of the lack of labeled data, especially when the domain is transferred. Meanwhile, unsupervised deep hashing models can hardly achieve satisfactory performance due to the lack of reliable similarity signals. To tackle this problem, we propose a novel deep unsupervised hashing method, namely Distilled Smooth Guidance (DSG), which can learn a distilled dataset consisting of similarity signals as well as smooth confidence signals. To be specific, we obtain the similarity confidence weights based on the initial noisy similarity signals learned from local structures and construct a priority loss function for smooth similarity-preserving learning. Besides, global information based on clustering is utilized to distill the image pairs by removing contradictory similarity signals. Extensive experiments on three widely used benchmark datasets show that the proposed DSG consistently outperforms the state-of-the-art search methods.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Graph Contrastive Clustering
Authors:
Huasong Zhong,
Jianlong Wu,
Chong Chen,
Jianqiang Huang,
Minghua Deng,
Liqiang Nie,
Zhouchen Lin,
Xian-Sheng Hua
Abstract:
Recently, some contrastive learning methods have been proposed to simultaneously learn representations and clustering assignments, achieving significant improvements. However, these methods do not take the category information and clustering objective into consideration, thus the learned representations are not optimal for clustering and the performance might be limited. Towards this issue, we fir…
▽ More
Recently, some contrastive learning methods have been proposed to simultaneously learn representations and clustering assignments, achieving significant improvements. However, these methods do not take the category information and clustering objective into consideration, thus the learned representations are not optimal for clustering and the performance might be limited. Towards this issue, we first propose a novel graph contrastive learning framework, which is then applied to the clustering task and we come up with the Graph Constrastive Clustering~(GCC) method. Different from basic contrastive clustering that only assumes an image and its augmentation should share similar representation and clustering assignments, we lift the instance-level consistency to the cluster-level consistency with the assumption that samples in one cluster and their augmentations should all be similar. Specifically, on the one hand, the graph Laplacian based contrastive loss is proposed to learn more discriminative and clustering-friendly features. On the other hand, a novel graph-based contrastive learning strategy is proposed to learn more compact clustering assignments. Both of them incorporate the latent category information to reduce the intra-cluster variance while increasing the inter-cluster variance. Experiments on six commonly used datasets demonstrate the superiority of our proposed approach over the state-of-the-art methods.
△ Less
Submitted 3 April, 2021;
originally announced April 2021.
-
CIMON: Towards High-quality Hash Codes
Authors:
Xiao Luo,
Daqing Wu,
Zeyu Ma,
Chong Chen,
Minghua Deng,
**wen Ma,
Zhongming **,
Jianqiang Huang,
Xian-Sheng Hua
Abstract:
Recently, hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information, i.e., treating each point pair similar if their distance is small i…
▽ More
Recently, hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information, i.e., treating each point pair similar if their distance is small in feature space. However, due to the inefficient representation ability of the pre-trained model, many false positives and negatives in local semantic similarity will be introduced and lead to error propagation during the hash code learning. Moreover, few of the methods consider the robustness of models, which will cause instability of hash codes to disturbance. In this paper, we propose a new method named {\textbf{C}}omprehensive s{\textbf{I}}milarity {\textbf{M}}ining and c{\textbf{O}}nsistency lear{\textbf{N}}ing (CIMON). First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods in both retrieval performance and robustness.
△ Less
Submitted 21 August, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Bottom-up mechanism and improved contract net protocol for the dynamic task planning of heterogeneous Earth observation resources
Authors:
Baoju Liu,
Min Deng,
Guohua Wu,
Xinyu Pei,
Haifeng Li,
Witold Pedrycz
Abstract:
Earth observation resources are becoming increasingly indispensable in disaster relief, damage assessment and related domains. Many unpredicted factors, such as the change of observation task requirements, to the occurring of bad weather and resource failures, may cause the scheduled observation scheme to become infeasible. Therefore, it is crucial to be able to promptly and maybe frequently devel…
▽ More
Earth observation resources are becoming increasingly indispensable in disaster relief, damage assessment and related domains. Many unpredicted factors, such as the change of observation task requirements, to the occurring of bad weather and resource failures, may cause the scheduled observation scheme to become infeasible. Therefore, it is crucial to be able to promptly and maybe frequently develop high-quality replanned observation schemes that minimize the effects on the scheduled tasks. A bottom-up distributed coordinated framework together with an improved contract net are proposed to facilitate the dynamic task replanning for heterogeneous Earth observation resources. This hierarchical framework consists of three levels, namely, neighboring resource coordination, single planning center coordination, and multiple planning center coordination. Observation tasks affected by unpredicted factors are assigned and treated along with a bottom-up route from resources to planning centers. This bottom-up distributed coordinated framework transfers part of the computing load to various nodes of the observation systems to allocate tasks more efficiently and robustly. To support the prompt assignment of large-scale tasks to proper Earth observation resources in dynamic environments, we propose a multiround combinatorial allocation (MCA) method. Moreover, a new float interval-based local search algorithm is proposed to obtain the promising planning scheme more quickly. The experiments demonstrate that the MCA method can achieve a better task completion rate for large-scale tasks with satisfactory time efficiency. It also demonstrates that this method can help to efficiently obtain replanning schemes based on original scheme in dynamic environments.
△ Less
Submitted 9 June, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
3D logic cells design and results based on Vertical NWFET technology including tied compact model
Authors:
C. Mukherjee,
M. Deng,
F. Marc,
C. Maneux,
A. Poittevin,
I. OConnor,
S. Le Beux,
A. Kumar,
A. Lecestre,
G. Larrieu
Abstract:
Gate-all-around Vertical Nanowire Field Effect Transistors (VNWFET) are emerging devices, which are well suited to pursue scaling beyond lateral scaling limitations around 7nm. This work explores the relative merits and drawbacks of the technology in the context of logic cell design. We describe a junctionless nanowire technology and associated compact model, which accurately describes fabricated…
▽ More
Gate-all-around Vertical Nanowire Field Effect Transistors (VNWFET) are emerging devices, which are well suited to pursue scaling beyond lateral scaling limitations around 7nm. This work explores the relative merits and drawbacks of the technology in the context of logic cell design. We describe a junctionless nanowire technology and associated compact model, which accurately describes fabricated device behavior in all regions of operations for transistors based on between 16 and 625 parallel nanowires of diameters between 22 and 50nm. We used this model to simulate the projected performance of inverter logic gates based on passive load, active load and complementary topologies and carry out an performance exploration for the number of nanowires in transistors. In terms of compactness, through a dedicated full 3D layout design, we also demonstrate a 1.4x reduction in lateral dimensions for the complementary structure with respect to 7nm FinFET-based inverters.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Clustering by Constructing Hyper-Planes
Authors:
Luhong Diao,
**ying Gao1,
Manman Deng
Abstract:
As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data points. It relies on the marginal space between the points. Then we combine these hyper-planes to determine centers and numbers of clusters. Because the algorithm is…
▽ More
As a kind of basic machine learning method, clustering algorithms group data points into different categories based on their similarity or distribution. We present a clustering algorithm by finding hyper-planes to distinguish the data points. It relies on the marginal space between the points. Then we combine these hyper-planes to determine centers and numbers of clusters. Because the algorithm is based on linear structures, it can approximate the distribution of datasets accurately and flexibly. To evaluate its performance, we compared it with some famous clustering algorithms by carrying experiments on different kinds of benchmark datasets. It outperforms other methods clearly.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
On the interplay between physical and content priors in deep learning for computational imaging
Authors:
Mo Deng,
Shuai Li,
Iksung Kang,
Nicholas X. Fang,
George Barbastathis
Abstract:
Deep learning (DL) has been applied extensively in many computational imaging problems, often leading to superior performance over traditional iterative approaches. However, two important questions remain largely unanswered: first, how well can the trained neural network generalize to objects very different from the ones in training? This is particularly important in practice, since large-scale an…
▽ More
Deep learning (DL) has been applied extensively in many computational imaging problems, often leading to superior performance over traditional iterative approaches. However, two important questions remain largely unanswered: first, how well can the trained neural network generalize to objects very different from the ones in training? This is particularly important in practice, since large-scale annotated examples similar to those of interest are often not available during training. Second, has the trained neural network learnt the underlying (inverse) physics model, or has it merely done something trivial, such as memorizing the examples or point-wise pattern matching? This pertains to the interpretability of machine-learning based algorithms. In this work, we use the Phase Extraction Neural Network (PhENN), a deep neural network (DNN) for quantitative phase retrieval in a lensless phase imaging system as the standard platform and show that the two questions are related and share a common crux: the choice of the training examples. Moreover, we connect the strength of the regularization effect imposed by a training set to the training process with the Shannon entropy of images in the dataset. That is, the higher the entropy of the training images, the weaker the regularization effect can be imposed. We also discover that weaker regularization effect leads to better learning of the underlying propagation model, i.e. the weak object transfer function, applicable for weakly scattering objects under the weak object approximation. Finally, simulation and experimental results show that better cross-domain generalization performance can be achieved if DNN is trained on a higher-entropy database, e.g. the ImageNet, than if the same DNN is trained on a lower-entropy database, e.g. MNIST, as the former allows the underlying physics model be learned better than the latter.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
A Survey on Deep Hashing Methods
Authors:
Xiao Luo,
Haixin Wang,
Daqing Wu,
Chong Chen,
Minghua Deng,
Jianqiang Huang,
Xian-Sheng Hua
Abstract:
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries, which is a basic task in a range of fields, including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional m…
▽ More
Nearest neighbor search aims to obtain the samples in the database with the smallest distances from them to the queries, which is a basic task in a range of fields, including computer vision and data mining. Hashing is one of the most widely used methods for its computational and storage efficiency. With the development of deep learning, deep hashing methods show more advantages than traditional methods. In this survey, we detailedly investigate current deep hashing algorithms including deep supervised hashing and deep unsupervised hashing. Specifically, we categorize deep supervised hashing methods into pairwise methods, ranking-based methods, pointwise methods as well as quantization according to how measuring the similarities of the learned hash codes. Moreover, deep unsupervised hashing is categorized into similarity reconstruction-based methods, pseudo-label-based methods and prediction-free self-supervised learning-based methods based on their semantic learning manners. We also introduce three related important topics including semi-supervised deep hashing, domain adaption deep hashing and multi-modal deep hashing. Meanwhile, we present some commonly used public datasets and the scheme to measure the performance of deep hashing algorithms. Finally, we discuss some potential research directions in conclusion.
△ Less
Submitted 23 April, 2022; v1 submitted 4 March, 2020;
originally announced March 2020.
-
Variational Quantum Circuits for Quantum State Tomography
Authors:
Yong Liu,
Dongyang Wang,
Shichuan Xue,
Anqi Huang,
Xiang Fu,
Xiaogang Qiang,
** Xu,
He-Liang Huang,
Mingtang Deng,
Chu Guo,
Xuejun Yang,
Junjie Wu
Abstract:
Quantum state tomography is a key process in most quantum experiments. In this work, we employ quantum machine learning for state tomography. Given an unknown quantum state, it can be learned by maximizing the fidelity between the output of a variational quantum circuit and this state. The number of parameters of the variational quantum circuit grows linearly with the number of qubits and the circ…
▽ More
Quantum state tomography is a key process in most quantum experiments. In this work, we employ quantum machine learning for state tomography. Given an unknown quantum state, it can be learned by maximizing the fidelity between the output of a variational quantum circuit and this state. The number of parameters of the variational quantum circuit grows linearly with the number of qubits and the circuit depth, so that only polynomial measurements are required, even for highly-entangled states. After that, a subsequent classical circuit simulator is used to transform the information of the target quantum state from the variational quantum circuit into a familiar format. We demonstrate our method by performing numerical simulations for the tomography of the ground state of a one-dimensional quantum spin chain, using a variational quantum circuit simulator. Our method is suitable for near-term quantum computing platforms, and could be used for relatively large-scale quantum state tomography for experimentally relevant quantum states.
△ Less
Submitted 11 April, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Learning to Synthesize: Robust Phase Retrieval at Low Photon counts
Authors:
Mo Deng,
Shuai Li,
Alexandre Goy,
Iksung Kang,
George Barbastathis
Abstract:
The quality of inverse problem solutions obtained through deep learning [Barbastathis et al, 2019] is limited by the nature of the priors learned from examples presented during the training phase. In the case of quantitative phase retrieval [Sinha et al, 2017, Goy et al, 2019], in particular, spatial frequencies that are underrepresented in the training database, most often at the high band, tend…
▽ More
The quality of inverse problem solutions obtained through deep learning [Barbastathis et al, 2019] is limited by the nature of the priors learned from examples presented during the training phase. In the case of quantitative phase retrieval [Sinha et al, 2017, Goy et al, 2019], in particular, spatial frequencies that are underrepresented in the training database, most often at the high band, tend to be suppressed in the reconstruction. Ad hoc solutions have been proposed, such as pre-amplifying the high spatial frequencies in the examples [Li et al, 2018]; however, while that strategy improves resolution, it also leads to high-frequency artifacts as well as low-frequency distortions in the reconstructions. Here, we present a new approach that learns separately how to handle the two frequency bands, low and high; and also learns how to synthesize these two bands into the full-band reconstructions. We show that this "learning to synthesize" (LS) method yields phase reconstructions of high spatial resolution and artifact-free; and it is also resilient to high-noise conditions, e.g. in the case of very low photon flux. In addition to the problem of quantitative phase retrieval, the LS method is applicable, in principle, to any inverse problem where the forward operator treats different frequency bands unevenly, i.e. is ill-posed.
△ Less
Submitted 26 July, 2019;
originally announced July 2019.
-
Learning to synthesize: splitting and recombining low and high spatial frequencies for image recovery
Authors:
Mo Deng,
Shuai Li,
George Barbastathis
Abstract:
Deep Neural Network (DNN)-based image reconstruction, despite many successes, often exhibits uneven fidelity between high and low spatial frequency bands. In this paper we propose the Learning Synthesis by DNN (LS-DNN) approach where two DNNs process the low and high spatial frequencies, respectively, and, improving over [30], the two DNNs are trained separately and a third DNN combines them into…
▽ More
Deep Neural Network (DNN)-based image reconstruction, despite many successes, often exhibits uneven fidelity between high and low spatial frequency bands. In this paper we propose the Learning Synthesis by DNN (LS-DNN) approach where two DNNs process the low and high spatial frequencies, respectively, and, improving over [30], the two DNNs are trained separately and a third DNN combines them into an image with high fidelity at all bands. We demonstrate LS-DNN in two canonical inverse problems: super-resolution (SR) in diffraction-limited imaging (DLI), and quantitative phase retrieval (QPR). Our results also show comparable or improved performance over perceptual-loss based SR [21], and can be generalized to a wider range of image recovery problems.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
Measuring Road Network Topology Vulnerability by Ricci Curvature
Authors:
Lei Gao,
Xingquan Liu,
Yu Liu,
Pu Wang,
Min Deng,
Qing Zhu,
Haifeng Li
Abstract:
Describing the basic properties of road network systems, such as their robustness, vulnerability, and reliability, has been a very important research topic in the field of urban transportation. Current research mainly uses several statistical indicators of complex networks to analyze the road network systems. However, these methods are essentially node-based. These node-based methods are more conc…
▽ More
Describing the basic properties of road network systems, such as their robustness, vulnerability, and reliability, has been a very important research topic in the field of urban transportation. Current research mainly uses several statistical indicators of complex networks to analyze the road network systems. However, these methods are essentially node-based. These node-based methods are more concerned with the number of connections between nodes, and lack of consideration for interactions. So, this leads to the well-known node paradox problem, and their ability of characterizing the local and intrinsic properties of a network is weak. From the perspective of network intrinsic geometry, this paper proposes a method for measuring road network vulnerability using a discrete Ricci curvature, which can identify the key sections of a road network and indicate its fragile elements. The results show that our method performs better than complex network statistics on measuring the vulnerability of a road network. Additionally, it can characterize the evolution of the road network vulnerability among different periods of time in the same city through our method. Finally, we compare our method with the previous method of centrality and show the different between them. This article provides a new perspective on a geometry to analyze the vulnerability of a road network and describes the inherent nature of the vulnerability of a road system from a new perspective. It also contributes to enriching the analytical methods of complex road networks.
△ Less
Submitted 10 April, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
T-GCN: A Temporal Graph ConvolutionalNetwork for Traffic Prediction
Authors:
Ling Zhao,
Yujiao Song,
Chao Zhang,
Yu Liu,
Pu Wang,
Tao Lin,
Min Deng,
Haifeng Li
Abstract:
Accurate and real-time traffic forecasting plays an important role in the Intelligent Traffic System and is of great significance for urban traffic planning, traffic management, and traffic control. However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, s…
▽ More
Accurate and real-time traffic forecasting plays an important role in the Intelligent Traffic System and is of great significance for urban traffic planning, traffic management, and traffic control. However, traffic forecasting has always been considered an open scientific issue, owing to the constraints of urban road network topological structure and the law of dynamic change with time, namely, spatial dependence and temporal dependence. To capture the spatial and temporal dependence simultaneously, we propose a novel neural network-based traffic forecasting method, the temporal graph convolutional network (T-GCN) model, which is in combination with the graph convolutional network (GCN) and gated recurrent unit (GRU). Specifically, the GCN is used to learn complex topological structures to capture spatial dependence and the gated recurrent unit is used to learn dynamic changes of traffic data to capture temporal dependence. Then, the T-GCN model is employed to traffic forecasting based on the urban road network. Experiments demonstrate that our T-GCN model can obtain the spatio-temporal correlation from traffic data and the predictions outperform state-of-art baselines on real-world traffic datasets. Our tensorflow implementation of the T-GCN is available at https://github.com/lehaifeng/T-GCN.
△ Less
Submitted 31 December, 2018; v1 submitted 11 November, 2018;
originally announced November 2018.
-
Learning to Measure Change: Fully Convolutional Siamese Metric Networks for Scene Change Detection
Authors:
Enqiang Guo,
Xinsha Fu,
Jiawei Zhu,
Min Deng,
Yu Liu,
Qing Zhu,
Haifeng Li
Abstract:
A critical challenge problem of scene change detection is that noisy changes generated by varying illumination, shadows and camera viewpoint make variances of a scene difficult to define and measure since the noisy changes and semantic ones are entangled. Following the intuitive idea of detecting changes by directly comparing dissimilarities between a pair of features, we propose a novel fully Con…
▽ More
A critical challenge problem of scene change detection is that noisy changes generated by varying illumination, shadows and camera viewpoint make variances of a scene difficult to define and measure since the noisy changes and semantic ones are entangled. Following the intuitive idea of detecting changes by directly comparing dissimilarities between a pair of features, we propose a novel fully Convolutional siamese metric Network(CosimNet) to measure changes by customizing implicit metrics. To learn more discriminative metrics, we utilize contrastive loss to reduce the distance between the unchanged feature pairs and to enlarge the distance between the changed feature pairs. Specifically, to address the issue of large viewpoint differences, we propose Thresholded Contrastive Loss (TCL) with a more tolerant strategy to punish noisy changes. We demonstrate the effectiveness of the proposed approach with experiments on three challenging datasets: CDnet, PCD2015, and VL-CMU-CD. Our approach is robust to lots of challenging conditions, such as illumination changes, large viewpoint difference caused by camera motion and zooming. In addition, we incorporate the distance metric into the segmentation framework and validate the effectiveness through visualization of change maps and feature distribution. The source code is available at https://github.com/gmayday1997/ChangeDet.
△ Less
Submitted 11 November, 2018; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Predicting protein inter-residue contacts using composite likelihood maximization and deep learning
Authors:
Haicang Zhang,
Qi Zhang,
Fusong Ju,
Jianwei Zhu,
Shiwei Sun,
Yujuan Gao,
Ziwei Xie,
Minghua Deng,
Shiwei Sun,
Wei-Mou Zheng,
Dongbo Bu
Abstract:
Accurate prediction of inter-residue contacts of a protein is important to calcu- lating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective to inferring inter-residue contacts. The Markov ran- dom field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is acc…
▽ More
Accurate prediction of inter-residue contacts of a protein is important to calcu- lating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective to inferring inter-residue contacts. The Markov ran- dom field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate, in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccu- rate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite- likelihood, i.e., the product of conditional probability of all residue pairs. Com- posite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, includ- ing PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction ac- curacy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset.
Accessibility: The software clmDCA and a server are publicly accessible through http://protein.ict.ac.cn/clmDCA/.
△ Less
Submitted 31 August, 2018;
originally announced September 2018.
-
On the Selective and Invariant Representation of DCNN for High-Resolution Remote Sensing Image Recognition
Authors:
Jie Chen,
Chao Yuan,
Min Deng,
Chao Tao,
Jian Peng,
Haifeng Li
Abstract:
Human vision possesses strong invariance in image recognition. The cognitive capability of deep convolutional neural network (DCNN) is close to the human visual level because of hierarchical coding directly from raw image. Owing to its superiority in feature representation, DCNN has exhibited remarkable performance in scene recognition of high-resolution remote sensing (HRRS) images and classifica…
▽ More
Human vision possesses strong invariance in image recognition. The cognitive capability of deep convolutional neural network (DCNN) is close to the human visual level because of hierarchical coding directly from raw image. Owing to its superiority in feature representation, DCNN has exhibited remarkable performance in scene recognition of high-resolution remote sensing (HRRS) images and classification of hyper-spectral remote sensing images. In-depth investigation is still essential for understanding why DCNN can accurately identify diverse ground objects via its effective feature representation. Thus, we train the deep neural network called AlexNet on our large scale remote sensing image recognition benchmark. At the neuron level in each convolution layer, we analyze the general properties of DCNN in HRRS image recognition by use of a framework of visual stimulation-characteristic response combined with feature coding-classification decoding. Specifically, we use histogram statistics, representational dissimilarity matrix, and class activation map** to observe the selective and invariance representations of DCNN in HRRS image recognition. We argue that selective and invariance representations play important roles in remote sensing images tasks, such as classification, detection, and segment. Also selective and invariance representations are significant to design new DCNN liked models for analyzing and understanding remote sensing images.
△ Less
Submitted 4 August, 2017;
originally announced August 2017.
-
A Probabilistic Embedding Clustering Method for Urban Structure Detection
Authors:
Xin Lin,
Haifeng Li,
Yan Zhang,
Lei Gao,
Ling Zhao,
Min Deng
Abstract:
Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clus…
▽ More
Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by learning via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data
Authors:
Haifeng Li,
Xin Dou,
Chao Tao,
Zhixiang Hou,
Jie Chen,
Jian Peng,
Min Deng,
Ling Zhao
Abstract:
In recent years, deep convolutional neural network (DCNN) has seen a breakthrough progress in natural image recognition because of three points: universal approximation ability via DCNN, large-scale database (such as ImageNet), and supercomputing ability powered by GPU. The remote sensing field is still lacking a large-scale benchmark compared to ImageNet and Place2. In this paper, we propose a re…
▽ More
In recent years, deep convolutional neural network (DCNN) has seen a breakthrough progress in natural image recognition because of three points: universal approximation ability via DCNN, large-scale database (such as ImageNet), and supercomputing ability powered by GPU. The remote sensing field is still lacking a large-scale benchmark compared to ImageNet and Place2. In this paper, we propose a remote sensing image classification benchmark (RSI-CB) based on massive, scalable, and diverse crowdsource data. Using crowdsource data, such as Open Street Map (OSM) data, ground objects in remote sensing images can be annotated effectively by points of interest, vector data from OSM, or other crowdsource data. The annotated images can be used in remote sensing image classification tasks. Based on this method, we construct a worldwide large-scale benchmark for remote sensing image classification. This benchmark has two sub-datasets with 256 by 256 and 128 by 128 sizes because different DCNNs require different image sizes. The former contains 6 categories with 35 subclasses of more than 24,000 images. The latter contains 6 categories with 45 subclasses of more than 36,000 images. This classification system of ground objects is defined according to the national standard of land-use classification in China and is inspired by the hierarchy mechanism of ImageNet. Finally, we conduct many experiments to compare RSI-CB with the SAT-4, SAT-6, and UC-Merced datasets on handcrafted features, such as scale-invariant feature transform, color histogram, local binary patterns, and GIST, and classical DCNN models, such as AlexNet, VGGNet, GoogLeNet, and ResNet.
△ Less
Submitted 10 January, 2020; v1 submitted 29 May, 2017;
originally announced May 2017.
-
What do We Learn by Semantic Scene Understanding for Remote Sensing imagery in CNN framework?
Authors:
Haifeng Li,
Jian Peng,
Chao Tao,
Jie Chen,
Min Deng
Abstract:
Recently, deep convolutional neural network (DCNN) achieved increasingly remarkable success and rapidly developed in the field of natural image recognition. Compared with the natural image, the scale of remote sensing image is larger and the scene and the object it represents are more macroscopic. This study inquires whether remote sensing scene and natural scene recognitions differ and raises the…
▽ More
Recently, deep convolutional neural network (DCNN) achieved increasingly remarkable success and rapidly developed in the field of natural image recognition. Compared with the natural image, the scale of remote sensing image is larger and the scene and the object it represents are more macroscopic. This study inquires whether remote sensing scene and natural scene recognitions differ and raises the following questions: What are the key factors in remote sensing scene recognition? Is the DCNN recognition mechanism centered on object recognition still applicable to the scenarios of remote sensing scene understanding? We performed several experiments to explore the influence of the DCNN structure and the scale of remote sensing scene understanding from the perspective of scene complexity. Our experiment shows that understanding a complex scene depends on an in-depth network and multiple-scale perception. Using a visualization method, we qualitatively and quantitatively analyze the recognition mechanism in a complex remote sensing scene and demonstrate the importance of multi-objective joint semantic support.
△ Less
Submitted 19 May, 2017;
originally announced May 2017.
-
Queue Theory based Response Time Analyses for Geo-Information Processing Chain
Authors:
Jie Chen,
Jian Peng,
Min Deng,
Chao Tao,
Haifeng Li
Abstract:
Typical characteristics of remote sensing applications are concurrent tasks, such as those found in disaster rapid response. The existing composition approach to geographical information processing service chain, searches for an optimisation solution and is what can be deemed a "selfish" way. This way leads to problems of conflict amongst concurrent tasks and decreases the performance of all servi…
▽ More
Typical characteristics of remote sensing applications are concurrent tasks, such as those found in disaster rapid response. The existing composition approach to geographical information processing service chain, searches for an optimisation solution and is what can be deemed a "selfish" way. This way leads to problems of conflict amongst concurrent tasks and decreases the performance of all service chains. In this study, a non-cooperative game-based mathematical model to analyse the competitive relationships between tasks, is proposed. A best response function is used, to assure each task maintains utility optimisation by considering composition strategies of other tasks and quantifying conflicts between tasks. Based on this, an iterative algorithm that converges to Nash equilibrium is presented, the aim being to provide good convergence and maximise the utilisation of all tasks under concurrent task conditions. Theoretical analyses and experiments showed that the newly proposed method, when compared to existing service composition methods, has better practical utility in all tasks.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
Urban spatial-temporal activity structures: a New Approach to Inferring the Intra-urban Functional Regions via Social Media Check-In Data
Authors:
Ye Zhi,
Yu Liu,
Shaowen Wang,
Min Deng,
**g Gao,
Haifeng Li
Abstract:
Most existing literature focuses on the exterior temporal rhythm of human movement to infer the functional regions in a city, but they neglects the underlying interdependence between the functional regions and human activities which uncovers more detailed characteristics of regions. In this research, we proposed a novel model based on the low rank approximation (LRA) to detect the functional regio…
▽ More
Most existing literature focuses on the exterior temporal rhythm of human movement to infer the functional regions in a city, but they neglects the underlying interdependence between the functional regions and human activities which uncovers more detailed characteristics of regions. In this research, we proposed a novel model based on the low rank approximation (LRA) to detect the functional regions using the data from about 15 million check-in records during a yearlong period in Shanghai, China. We find a series of latent structures, called urban spatial-temporal activity structure (USTAS). While interpreting these structures, a series of outstanding underlying associations between the spatial and temporal activity patterns can be found. Moreover, we can not only reproduce the observed data with a lower dimensional representative but also simultaneously project both the spatial and temporal activity patterns in the same coordinate system. By utilizing the K-means clustering algorithm, five significant types of clusters which are directly annotated with a corresponding combination of temporal activities can be obtained. This provides a clear picture of how the groups of regions are associated with different activities at different time of day. Besides the commercial and transportation dominant area, we also detect two kinds of residential areas, the developed residential areas and the develo** residential areas. We further verify the spatial distribution of these clusters in the view of urban form analysis. The results shows a high consistency with the government planning from the same periods, indicating our model is applicable for inferring the functional regions via social media check-in data, and can benefit a wide range of fields, such as urban planning, public services and location-based recommender systems and other purposes.
△ Less
Submitted 20 January, 2015; v1 submitted 23 December, 2014;
originally announced December 2014.