-
Free Performance Gain from Mixing Multiple Partially Labeled Samples in Multi-label Image Classification
Authors:
Chak Fong Chong,
Jielong Guo,
Xu Yang,
Wei Ke,
Yapeng Wang
Abstract:
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to construct augmented samples. In this paper, we prop…
▽ More
Multi-label image classification datasets are often partially labeled where many labels are missing, posing a significant challenge to training accurate deep classifiers. However, the powerful Mixup sample-mixing data augmentation cannot be well utilized to address this challenge, as it cannot perform linear interpolation on the unknown labels to construct augmented samples. In this paper, we propose LogicMix, a Mixup variant designed for such partially labeled datasets. LogicMix mixes the sample labels by logical OR so that the unknown labels can be correctly mixed by utilizing OR's logical equivalences, including the domination and identity laws. Unlike Mixup, which mixes exactly two samples, LogicMix can mix multiple ($\geq2$) partially labeled samples, constructing visually more confused augmented samples to regularize training. LogicMix is more general and effective than other compared Mixup variants in the experiments on various partially labeled dataset scenarios. Moreover, it is plug-and-play and only requires minimal computation, hence it can be easily inserted into existing frameworks to collaborate with other methods to improve model performance with a negligible impact on training time, as demonstrated through extensive experiments. In particular, through the collaboration of LogicMix, RandAugment, Curriculum Labeling, and Category-wise Fine-Tuning, we attain state-of-the-art performance on MS-COCO, VG-200, and Pascal VOC 2007 benchmarking datasets. The remarkable generality, effectiveness, collaboration, and simplicity suggest that LogicMix promises to be a popular and vital data augmentation method.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Ty** Requirement Model as Coroutines
Authors:
Qiqi Gu,
Wei Ke
Abstract:
Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is crucial. RM2PT is a widely used MDE platform that em…
▽ More
Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is crucial. RM2PT is a widely used MDE platform that employs the REModel language for requirement modeling. REModel contains contract sections and other sections including a UML sequence diagram. This paper contributes a coroutine-based type system that represents pre- and post-conditions in the contract sections in a requirement model as the receiving and yielding parts of coroutines, respectively. The type system is capable of composing coroutine types, so that users can view functions as a whole system and check their collective behavior. By doing so, our type system ensures that the contracts defined in it are executed as outlined in the accompanied sequence diagram. We assessed our approach using four case studies provided by RM2PT, validating the accuracy of the models.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Authors:
Jiajun Liu,
Wenjun Ke,
Peng Wang,
Ziyu Shang,
**hua Gao,
Guozheng Li,
Ke Ji,
Yanhe Liu
Abstract:
Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the e…
▽ More
Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction
Authors:
Guozheng Li,
Peng Wang,
Wenjun Ke,
Yikai Guo,
Ke Ji,
Ziyu Shang,
Jiajun Liu,
Zijie Xu
Abstract:
Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from…
▽ More
Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from training examples, and (2) enabling LLMs exhibit strong ICL abilities in RE. On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations. On the other hand, ICL with an LLM achieves poor performance in RE while RE is different from language modeling in nature or the LLM is not large enough. In this work, we propose a novel recall-retrieve-reason RE framework that synergizes LLMs with retrieval corpora (training examples) to enable relevant retrieving and reliable in-context reasoning. Specifically, we distill the consistently ontological knowledge from training datasets to let LLMs generate relevant entity pairs grounded by retrieval corpora as valid queries. These entity pairs are then used to retrieve relevant training examples from the retrieval corpora as demonstrations for LLMs to conduct better ICL via instruction tuning. Extensive experiments on different LLMs and RE datasets demonstrate that our method generates relevant and valid entity pairs and boosts ICL abilities of LLMs, achieving competitive or new state-of-the-art performance on sentence-level RE compared to previous supervised fine-tuning methods and ICL-based methods.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News
Authors:
Mengna Zhu,
Zijie Xu,
Kaisheng Zeng,
Kaiming Xiao,
Mao Wang,
Wenjun Ke,
Hongbin Huang
Abstract:
Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE,…
▽ More
Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Authors:
Yanhao Wu,
Tong Zhang,
Wei Ke,
Congpei Qiu,
Sabine Susstrunk,
Mathieu Salzmann
Abstract:
In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we i…
▽ More
In the realm of point cloud scene understanding, particularly in indoor scenes, objects are arranged following human habits, resulting in objects of certain semantics being closely positioned and displaying notable inter-object correlations. This can create a tendency for neural networks to exploit these strong dependencies, bypassing the individual object patterns. To address this challenge, we introduce a novel self-supervised learning (SSL) strategy. Our approach leverages both object patterns and contextual cues to produce robust features. It begins with the formulation of an object-exchanging strategy, where pairs of objects with comparable sizes are exchanged across different scenes, effectively disentangling the strong contextual dependencies. Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes. Our extensive experiments demonstrate the superiority of our method over existing SSL techniques, further showing its better robustness to environmental changes. Moreover, we showcase the applicability of our approach by transferring pre-trained models to diverse point cloud datasets.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Authors:
Zicheng Zhang,
Tong Zhang,
Yi Zhu,
Jianzhuang Liu,
Xiaodan Liang,
QiXiang Ye,
Wei Ke
Abstract:
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a transformer decoder to generate semantic masks. Despite its effectiveness, prevailing methods within this paradigm encounter challenges, including overfitting on seen classes and small fragmentation in masks. To mitigate these issues, we p…
▽ More
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic segmentation by aligning visual features with class embeddings through a transformer decoder to generate semantic masks. Despite its effectiveness, prevailing methods within this paradigm encounter challenges, including overfitting on seen classes and small fragmentation in masks. To mitigate these issues, we propose a Language-Driven Visual Consensus (LDVC) approach, fostering improved alignment of semantic and visual information.Specifically, we leverage class embeddings as anchors due to their discrete and abstract nature, steering vision features toward class embeddings. Moreover, to circumvent noisy alignments from the vision part due to its redundant nature, we introduce route attention into self-attention for finding visual consensus, thereby enhancing semantic consistency within the same object. Equipped with a vision-language prompting strategy, our approach significantly boosts the generalization capacity of segmentation models for unseen classes. Experimental results underscore the effectiveness of our approach, showcasing mIoU gains of 4.5 on the PASCAL VOC 2012 and 3.6 on the COCO-Stuff 164k for unseen classes compared with the state-of-the-art methods.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Incorporating Improved Sinusoidal Threshold-based Semi-supervised Method and Diffusion Models for Osteoporosis Diagnosis
Authors:
Wenchi Ke
Abstract:
Osteoporosis is a common skeletal disease that seriously affects patients' quality of life. Traditional osteoporosis diagnosis methods are expensive and complex. The semi-supervised model based on diffusion model and class threshold sinusoidal decay proposed in this paper can automatically diagnose osteoporosis based on patient's imaging data, which has the advantages of convenience, accuracy, and…
▽ More
Osteoporosis is a common skeletal disease that seriously affects patients' quality of life. Traditional osteoporosis diagnosis methods are expensive and complex. The semi-supervised model based on diffusion model and class threshold sinusoidal decay proposed in this paper can automatically diagnose osteoporosis based on patient's imaging data, which has the advantages of convenience, accuracy, and low cost. Unlike previous semi-supervised models, all the unlabeled data used in this paper are generated by the diffusion model. Compared with real unlabeled data, synthetic data generated by the diffusion model show better performance. In addition, this paper proposes a novel pseudo-label threshold adjustment mechanism, Sinusoidal Threshold Decay, which can make the semi-supervised model converge more quickly and improve its performance. Specifically, the method is tested on a dataset including 749 dental panoramic images, and its achieved leading detect performance and produces a 80.10% accuracy.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition
Authors:
Yutian Liu,
Wenjun Ke,
Jianguo Wei
Abstract:
Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing…
▽ More
Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing since it cannot correct the erroneous attention on image areas that should be parsed at subsequent decoding steps. This faulty attention causes the attention module to incorporate future context into the current decoding step, thereby confusing the alignment process. To address this issue, we propose an attention guidance mechanism to explicitly suppress attention weights in irrelevant areas and enhance the appropriate ones, thereby inhibiting access to information outside the intended context. Depending on the type of attention guidance, we devise two complementary approaches to refine attention weights: self-guidance that coordinates attention of multiple heads and neighbor-guidance that integrates attention from adjacent time steps. Experiments show that our method outperforms existing state-of-the-art methods, achieving expression recognition rates of 60.75% / 61.81% / 63.30% on the CROHME 2014/ 2016/ 2019 datasets.
△ Less
Submitted 5 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Unlocking Instructive In-Context Learning with Tabular Prompting for Relational Triple Extraction
Authors:
Guozheng Li,
Wenjun Ke,
Peng Wang,
Zijie Xu,
Ke Ji,
Jiajun Liu,
Ziyu Shang,
Qiqing Luo
Abstract:
The in-context learning (ICL) for relational triple extraction (RTE) has achieved promising performance, but still encounters two key challenges: (1) how to design effective prompts and (2) how to select proper demonstrations. Existing methods, however, fail to address these challenges appropriately. On the one hand, they usually recast RTE task to text-to-text prompting formats, which is unnatura…
▽ More
The in-context learning (ICL) for relational triple extraction (RTE) has achieved promising performance, but still encounters two key challenges: (1) how to design effective prompts and (2) how to select proper demonstrations. Existing methods, however, fail to address these challenges appropriately. On the one hand, they usually recast RTE task to text-to-text prompting formats, which is unnatural and results in a mismatch between the output format at the pre-training time and the inference time for large language models (LLMs). On the other hand, they only utilize surface natural language features and lack consideration of triple semantics in sample selection. These issues are blocking improved performance in ICL for RTE, thus we aim to tackle prompt designing and sample selection challenges simultaneously. To this end, we devise a tabular prompting for RTE (\textsc{TableIE}) which frames RTE task into a table generation task to incorporate explicit structured information into ICL, facilitating conversion of outputs to RTE structures. Then we propose instructive in-context learning (I$^2$CL) which only selects and annotates a few samples considering internal triple semantics in massive unlabeled samples.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Category-wise Fine-Tuning: Resisting Incorrect Pseudo-Labels in Multi-Label Image Classification with Partial Labels
Authors:
Chak Fong Chong,
Xinyi Fang,
Jielong Guo,
Yapeng Wang,
Wei Ke,
Chan-Tong Lam,
Sio-Kei Im
Abstract:
Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we…
▽ More
Large-scale image datasets are often partially labeled, where only a few categories' labels are known for each image. Assigning pseudo-labels to unknown labels to gain additional training signals has become prevalent for training deep classification models. However, some pseudo-labels are inevitably incorrect, leading to a notable decline in the model classification performance. In this paper, we propose a novel method called Category-wise Fine-Tuning (CFT), aiming to reduce model inaccuracies caused by the wrong pseudo-labels. In particular, CFT employs known labels without pseudo-labels to fine-tune the logistic regressions of trained models individually to calibrate each category's model predictions. Genetic Algorithm, seldom used for training deep models, is also utilized in CFT to maximize the classification performance directly. CFT is applied to well-trained models, unlike most existing methods that train models from scratch. Hence, CFT is general and compatible with models trained with different methods and schemes, as demonstrated through extensive experiments. CFT requires only a few seconds for each category for calibration with consumer-grade GPUs. We achieve state-of-the-art results on three benchmarking datasets, including the CheXpert chest X-ray competition dataset (ensemble mAUC 93.33%, single model 91.82%), partially labeled MS-COCO (average mAP 83.69%), and Open Image V3 (mAP 85.31%), outperforming the previous bests by 0.28%, 2.21%, 2.50%, and 0.91%, respectively. The single model on CheXpert has been officially evaluated by the competition server, endorsing the correctness of the result. The outstanding results and generalizability indicate that CFT could be substantial and prevalent for classification model development. Code is available at: https://github.com/maxium0526/category-wise-fine-tuning.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Distance Guided Generative Adversarial Network for Explainable Binary Classifications
Authors:
Xiangyu Xiong,
Yue Sun,
Xiaohong Liu,
Wei Ke,
Chan-Tong Lam,
Jiangang Chen,
Mingfeng Jiang,
Mingwei Wang,
Hui Xie,
Tong Tong,
Qinquan Gao,
Hao Chen,
Tao Tan
Abstract:
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classi…
▽ More
Despite the potential benefits of data augmentation for mitigating the data insufficiency, traditional augmentation methods primarily rely on the prior intra-domain knowledge. On the other hand, advanced generative adversarial networks (GANs) generate inter-domain samples with limited variety. These previous methods make limited contributions to describing the decision boundaries for binary classification. In this paper, we propose a distance guided GAN (DisGAN) which controls the variation degrees of generated samples in the hyperplane space. Specifically, we instantiate the idea of DisGAN by combining two ways. The first way is vertical distance GAN (VerDisGAN) where the inter-domain generation is conditioned on the vertical distances. The second way is horizontal distance GAN (HorDisGAN) where the intra-domain generation is conditioned on the horizontal distances. Furthermore, VerDisGAN can produce the class-specific regions by map** the source images to the hyperplane. Experimental results show that DisGAN consistently outperforms the GAN-based augmentation methods with explainable binary classification. The proposed method can apply to different classification architectures and has potential to extend to multi-class classification.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
65 GOPS/neuron Photonic Tensor Core with Thin-film Lithium Niobate Photonics
Authors:
Zhong** Lin,
Bhavin J. Shastri,
Shangxuan Yu,
**gxiang Song,
Yuntao Zhu,
Arman Safarnejadian,
Wangning Cai,
Yanmei Lin,
Wei Ke,
Mustafa Hammood,
Tianye Wang,
Mengyue Xu,
Zibo Zheng,
Mohammed Al-Qadasi,
Omid Esmaeeli,
Mohamed Rahim,
Grzegorz Pakulski,
Jens Schmid,
Pedro Barrios,
Weihong Jiang,
Hugh Morison,
Matthew Mitchell,
Xiaogang Qiang,
Xun Guan,
Nicolas A. F. Jaeger
, et al. (6 additional authors not shown)
Abstract:
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III…
▽ More
Photonics offers a transformative approach to artificial intelligence (AI) and neuromorphic computing by providing low latency, high bandwidth, and energy-efficient computations. Here, we introduce a photonic tensor core processor enabled by time-multiplexed inputs and charge-integrated outputs. This fully integrated processor, comprising only two thin-film lithium niobate (TFLN) modulators, a III-V laser, and a charge-integration photoreceiver, can implement an entire layer of a neural network. It can execute 65 billion operations per second (GOPS) per neuron, including simultaneous weight updates-a hitherto unachieved speed. Our processor stands out from conventional photonic processors, which have static weights set during training, as it supports fast "hardware-in-the-loop" training, and can dynamically adjust the inputs (fan-in) and outputs (fan-out) within a layer, thereby enhancing its versatility. Our processor can perform large-scale dot-product operations with vector dimensions up to 131,072. Furthermore, it successfully classifies (supervised learning) and clusters (unsupervised learning) 112*112-pixel images after "hardware-in-the-loop" training. To handle "hardware-in-the-loop" training for clustering AI tasks, we provide a solution for multiplications involving two negative numbers based on our processor.
△ Less
Submitted 30 November, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
A Parameterized Generative Adversarial Network Using Cyclic Projection for Explainable Medical Image Classification
Authors:
Xiangyu Xiong,
Yue Sun,
Xiaohong Liu,
Chan-Tong Lam,
Tong Tong,
Hao Chen,
Qinquan Gao,
Wei Ke,
Tao Tan
Abstract:
Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among do…
▽ More
Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among domains and highlights the attention regions for downstream classification. Specifically, ParaGAN incorporates projection distance parameters in cyclic projection and projects the source images to the decision boundary to obtain the class-difference maps. Our experiments show that ParaGAN can consistently outperform the existing augmentation methods with explainable classification on two small-scale medical datasets.
△ Less
Submitted 14 December, 2023; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Revisiting Large Language Models as Zero-shot Relation Extractors
Authors:
Guozheng Li,
Peng Wang,
Wenjun Ke
Abstract:
Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the stu…
▽ More
Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting. Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt, which provides the possibility of extracting relations from text without any data and parameter tuning. This work focuses on the study of exploring LLMs, such as ChatGPT, as zero-shot relation extractors. On the one hand, we analyze the drawbacks of existing RE prompts and attempt to incorporate recent prompt techniques such as chain-of-thought (CoT) to improve zero-shot RE. We propose the summarize-and-ask (\textsc{SumAsk}) prompting, a simple prompt recursively using LLMs to transform RE inputs to the effective question answering (QA) format. On the other hand, we conduct comprehensive experiments on various benchmarks and settings to investigate the capabilities of LLMs on zero-shot RE. Specifically, we have the following findings: (i) \textsc{SumAsk} consistently and significantly improves LLMs performance on different model sizes, benchmarks and settings; (ii) Zero-shot prompting with ChatGPT achieves competitive or superior results compared with zero-shot and fully supervised methods; (iii) LLMs deliver promising performance in extracting overlap** relations; (iv) The performance varies greatly regarding different relations. Different from small language models, LLMs are effective in handling challenge none-of-the-above (NoTA) relation.
△ Less
Submitted 24 November, 2023; v1 submitted 8 October, 2023;
originally announced October 2023.
-
Ty** Composable Coroutines
Authors:
Qiqi Gu,
Wei Ke
Abstract:
Coroutine, as a powerful programming construct, is widely used in asynchronous applications to replace thread-based programming or the callback hell. Using coroutines makes code more readable and maintainable, for its ability to transfer control while kee** the literal scope. However, reasoning about coroutine behavior can be challenging without proper ty**. We propose a type notation and calc…
▽ More
Coroutine, as a powerful programming construct, is widely used in asynchronous applications to replace thread-based programming or the callback hell. Using coroutines makes code more readable and maintainable, for its ability to transfer control while kee** the literal scope. However, reasoning about coroutine behavior can be challenging without proper ty**. We propose a type notation and calculus for composing asymmetric, first-class, stackless coroutines. Given the types of a list of coroutines, we can compute a composed type matching the collective behavior of the coroutines, so that the input and output can be type-checked by a type system. Our coroutine types can model the data received by or yielded from a coroutine, which be of coroutine types as well. On top of our type calculus, we discuss its soundness and evaluation issues, then provide four application scenarios of our coroutine types. Not only can our types be used in modern programming languages, such as Python, but also model program behaviors in OCaml and even Prolog.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Image encryption for Offshore wind power based on 2D-LCLM and Zhou Yi Eight Trigrams
Authors:
Lei Kou,
**bo Wu,
Fangfang Zhang,
Peng Ji,
Wende Ke,
Junhe Wan,
Hailin Liu,
Yang Li,
Quande Yuan
Abstract:
Offshore wind power is an important part of the new power system, due to the complex and changing situation at ocean, its normal operation and maintenance cannot be done without information such as images, therefore, it is especially important to transmit the correct image in the process of information transmission. In this paper, we propose a new encryption algorithm for offshore wind power based…
▽ More
Offshore wind power is an important part of the new power system, due to the complex and changing situation at ocean, its normal operation and maintenance cannot be done without information such as images, therefore, it is especially important to transmit the correct image in the process of information transmission. In this paper, we propose a new encryption algorithm for offshore wind power based on two-dimensional lagged complex logistic map** (2D-LCLM) and Zhou Yi Eight Trigrams. Firstly, the initial value of the 2D-LCLM is constructed by the Sha-256 to associate the 2D-LCLM with the plaintext. Secondly, a new encryption rule is proposed from the Zhou Yi Eight Trigrams to obfuscate the pixel values and generate the round key. Then, 2D-LCLM is combined with the Zigzag to form an S-box. Finally, the simulation experiment of the algorithm is accomplished. The experimental results demonstrate that the algorithm can resistant common attacks and has prefect encryption performance.
△ Less
Submitted 27 June, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Community Detection Using Revised Medoid-Shift Based on KNN
Authors:
Jie Hou,
Jiakang Li,
Xiaokang Peng,
Wei Ke,
Yonggang Lu
Abstract:
Community detection becomes an important problem with the booming of social networks. The Medoid-Shift algorithm preserves the benefits of Mean-Shift and can be applied to problems based on distance matrix, such as community detection. One drawback of the Medoid-Shift algorithm is that there may be no data points within the neighborhood region defined by a distance parameter. To deal with the comm…
▽ More
Community detection becomes an important problem with the booming of social networks. The Medoid-Shift algorithm preserves the benefits of Mean-Shift and can be applied to problems based on distance matrix, such as community detection. One drawback of the Medoid-Shift algorithm is that there may be no data points within the neighborhood region defined by a distance parameter. To deal with the community detection problem better, a new algorithm called Revised Medoid-Shift (RMS) in this work is thus proposed. During the process of finding the next medoid, the RMS algorithm is based on a neighborhood defined by KNN, while the original Medoid-Shift is based on a neighborhood defined by a distance parameter. Since the neighborhood defined by KNN is more stable than the one defined by the distance parameter in terms of the number of data points within the neighborhood, the RMS algorithm may converge more smoothly. In the RMS method, each of the data points is shifted towards a medoid within the neighborhood defined by KNN. After the iterative process of shifting, each of the data point converges into a cluster center, and the data points converging into the same center are grouped into the same cluster. The RMS algorithm is tested on two kinds of datasets including community datasets with known ground truth partition and community datasets without ground truth partition respectively. The experiment results show sthat the proposed RMS algorithm generally produces betster results than Medoid-Shift and some state-of-the-art together with most classic community detection algorithms on different kinds of community detection datasets.
△ Less
Submitted 19 September, 2023; v1 submitted 19 April, 2023;
originally announced April 2023.
-
Crowd Counting with Sparse Annotation
Authors:
Shiwei Zhang,
Zhengzheng Wang,
Qing Liu,
Fei Wang,
Wei Ke,
Tong Zhang
Abstract:
This paper presents a new annotation method called Sparse Annotation (SA) for crowd counting, which reduces human labeling efforts by sparsely labeling individuals in an image. We argue that sparse labeling can reduce the redundancy of full annotation and capture more diverse information from distant individuals that is not fully captured by Partial Annotation methods. Besides, we propose a point-…
▽ More
This paper presents a new annotation method called Sparse Annotation (SA) for crowd counting, which reduces human labeling efforts by sparsely labeling individuals in an image. We argue that sparse labeling can reduce the redundancy of full annotation and capture more diverse information from distant individuals that is not fully captured by Partial Annotation methods. Besides, we propose a point-based Progressive Point Matching network (PPM) to better explore the crowd from the whole image with sparse annotation, which includes a Proposal Matching Network (PMN) and a Performance Restoration Network (PRN). The PMN generates pseudo-point samples using a basic point classifier, while the PRN refines the point classifier with the pseudo points to maximize performance. Our experimental results show that PPM outperforms previous semi-supervised crowd counting methods with the same amount of annotation by a large margin and achieves competitive performance with state-of-the-art fully-supervised methods.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
De-coupling and De-positioning Dense Self-supervised Learning
Authors:
Congpei Qiu,
Tong Zhang,
Wei Ke,
Mathieu Salzmann,
Sabine Süsstrunk
Abstract:
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing…
▽ More
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects. Although the dense features extracted by employing segmentation maps and bounding boxes allow networks to perform SSL for each object, we show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding. We address this by introducing three data augmentation strategies, and leveraging them in (i) a decoupling module that aims to robustify the network to variations in the object's surroundings, and (ii) a de-positioning module that encourages the network to discard positional object information. We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection. Our extensive experiments evidence the better generalization of our method compared to the SOTA dense SSL methods
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
Authors:
Yanhao Wu,
Tong Zhang,
Wei Ke,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these metho…
▽ More
Self-supervised learning (SSL) has the potential to benefit many applications, particularly those where manually annotating data is cumbersome. One such situation is the semantic segmentation of point clouds. In this context, existing methods employ contrastive learning strategies and define positive pairs by performing various augmentation of point clusters in a single frame. As such, these methods do not exploit the temporal nature of LiDAR data. In this paper, we introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain. To this end, we design (i) a point-to-cluster learning strategy that aggregates spatial information to distinguish objects; and (ii) a cluster-to-cluster learning strategy based on unsupervised object tracking that exploits temporal correspondences. We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets and transferring the resulting models to other point cloud segmentation benchmarks. Our results evidence that our method outperforms the state-of-the-art point cloud SSL methods.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
Authors:
Zicheng Zhang,
Yi Zhu,
Jianzhuang Liu,
Xiaodan Liang,
Wei Ke
Abstract:
Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects, but ignore the semantic consistency of pixels within the same object, leading to incomplete masks and localization errors in predictions. To…
▽ More
Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects, but ignore the semantic consistency of pixels within the same object, leading to incomplete masks and localization errors in predictions. To tackle this problem, we propose CoupAlign, a simple yet effective multi-level visual-semantic alignment method, to couple sentence-mask alignment with word-pixel alignment to enforce object mask constraint for achieving more accurate localization and segmentation. Specifically, the Word-Pixel Alignment (WPA) module performs early fusion of linguistic and pixel-level features in intermediate layers of the vision and language encoders. Based on the word-pixel aligned embedding, a set of mask proposals are generated to hypothesize possible objects. Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target. To further enhance the learning of the two alignment modules, an auxiliary loss is designed to contrast the foreground and background pixels. By hierarchically aligning pixels and masks with linguistic features, our CoupAlign captures the pixel coherence at both visual and semantic levels, thus generating more accurate predictions. Extensive experiments on popular datasets (e.g., RefCOCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e.g., about 2% oIoU increase on the validation and testing set of RefCOCO. Especially, CoupAlign has remarkable ability in distinguishing the target from multiple objects of the same class.
△ Less
Submitted 4 December, 2022;
originally announced December 2022.
-
Review on Monitoring, Operation and Maintenance of Smart Offshore Wind Farms
Authors:
Lei Kou,
Yang Li,
Fangfang Zhang,
Xiaodong Gong,
Yinghong Hu,
Quande Yuan,
Wende Ke
Abstract:
In recent years, with the development of wind energy, the number and scale of wind farms are develo** rapidly. Since offshore wind farm has the advantages of stable wind speed, clean, renewable, non-polluting and no occupation of cultivated land, which has gradually become a new trend of wind power industry all over the world. The operation and maintenance mode of offshore wind power is developi…
▽ More
In recent years, with the development of wind energy, the number and scale of wind farms are develo** rapidly. Since offshore wind farm has the advantages of stable wind speed, clean, renewable, non-polluting and no occupation of cultivated land, which has gradually become a new trend of wind power industry all over the world. The operation and maintenance mode of offshore wind power is develo** in the direction of digitization and intelligence. It is of great significance to carry out the research on the monitoring, operation and maintenance of offshore wind farm, which will be of benefits to reduce the operation and maintenance cost, improve the power generation efficiency, improve the stability of offshore wind farm system and build smart offshore wind farm. This paper will mainly analyze and summarize the monitoring, operation and maintenance of offshore wind farm, especially from the following points: monitoring of "offshore wind power engineering & biological & environment", the monitoring of power equipment and the operation & maintenance of smart offshore wind farms. Finally, the future research challenges about monitoring, operation and maintenance of smart offshore wind farm are proposed, and the future research directions in this field are prospected.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
On Triangular Inequality of the Discounted Least Information Theory of Entropy (DLITE)
Authors:
Kashti S. Umare,
Weimao Ke
Abstract:
The Discounted Least Information Theory of Entropy (DLITE) is a new information measure that quantifies the amount of entropic difference between two probability distributions. It manifests multiple critical properties both as an information-theoretic quantity and as metric distance. In the report, we provide a proof of the triangular inequality of DLITE's cube root ($\sqrt[3]{DL}$), an important…
▽ More
The Discounted Least Information Theory of Entropy (DLITE) is a new information measure that quantifies the amount of entropic difference between two probability distributions. It manifests multiple critical properties both as an information-theoretic quantity and as metric distance. In the report, we provide a proof of the triangular inequality of DLITE's cube root ($\sqrt[3]{DL}$), an important property of a metric, along with alternative proofs for two additional properties.
△ Less
Submitted 14 October, 2022;
originally announced October 2022.
-
Data Encryption based on 7D Complex Chaotic System with Cubic Memristor for Smart Grid
Authors:
Lei Kou,
Zhe Huang,
Cuimei Jiang,
Fangfang Zhang,
Wende Ke,
Junhe Wan,
Hailin Liu,
Hui Li
Abstract:
The information security has an irreplaceable position in the smart grid (SG). In order to avoid the malicious attack and ensure the information security, the cryptographic techniques are essential. This paper focuses on the encryption techniques to ensure the information security of SG. Firstly, an unusual 7-dimensional complex chaotic system (7D-CCS) combined with the cubic memristor is introduc…
▽ More
The information security has an irreplaceable position in the smart grid (SG). In order to avoid the malicious attack and ensure the information security, the cryptographic techniques are essential. This paper focuses on the encryption techniques to ensure the information security of SG. Firstly, an unusual 7-dimensional complex chaotic system (7D-CCS) combined with the cubic memristor is introduced. Besides its phase portraits, Lyapunov exponent, 0-1 test, complexity, and bifurcation diagram are investigated. Then, with the proposed 7D-CCS, we design a data encryption algorithm to ensure the encryption security. Finally, the data and monitoring images in SG are encrypted by the designed encryption scheme. Besides, the encryption performance is given in detailed. The experimental results show that the proposed encryption scheme has quite good encryption performance. Therefore, it can ensure the information security of SG.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
HPS-Det: Dynamic Sample Assignment with Hyper-Parameter Search for Object Detection
Authors:
Ji Liu,
Dong Li,
Zekun Li,
Han Liu,
Wen**g Ke,
Lu Tian,
Yi Shan
Abstract:
Sample assignment plays a prominent part in modern object detection approaches. However, most existing methods rely on manual design to assign positive / negative samples, which do not explicitly establish the relationships between sample assignment and object detection performance. In this work, we propose a novel dynamic sample assignment scheme based on hyper-parameter search. We first define t…
▽ More
Sample assignment plays a prominent part in modern object detection approaches. However, most existing methods rely on manual design to assign positive / negative samples, which do not explicitly establish the relationships between sample assignment and object detection performance. In this work, we propose a novel dynamic sample assignment scheme based on hyper-parameter search. We first define the number of positive samples assigned to each ground truth as the hyper-parameters and employ a surrogate optimization algorithm to derive the optimal choices. Then, we design a dynamic sample assignment procedure to dynamically select the optimal number of positives at each training iteration. Experiments demonstrate that the resulting HPS-Det brings improved performance over different object detection baselines. Moreover, We analyze the hyper-parameter reusability when transferring between different datasets and between different backbones for object detection, which exhibits the superiority and versatility of our method.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
Authors:
Haowei Zhu,
Wen**g Ke,
Dong Li,
Ji Liu,
Lu Tian,
Yi Shan
Abstract:
Recently, self-attention mechanisms have shown impressive performance in various NLP and CV tasks, which can help capture sequential characteristics and derive global information. In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. To this end, we propose a d…
▽ More
Recently, self-attention mechanisms have shown impressive performance in various NLP and CV tasks, which can help capture sequential characteristics and derive global information. In this work, we explore how to extend self-attention modules to better learn subtle feature embeddings for recognizing fine-grained objects, e.g., different bird species or person identities. To this end, we propose a dual cross-attention learning (DCAL) algorithm to coordinate with self-attention learning. First, we propose global-local cross-attention (GLCA) to enhance the interactions between global images and local high-response regions, which can help reinforce the spatial-wise discriminative clues for recognition. Second, we propose pair-wise cross-attention (PWCA) to establish the interactions between image pairs. PWCA can regularize the attention learning of an image by treating another image as distractor and will be removed during inference. We observe that DCAL can reduce misleading attentions and diffuse the attention response to discover more complementary parts for recognition. We conduct extensive evaluations on fine-grained visual categorization and object re-identification. Experiments demonstrate that DCAL performs on par with state-of-the-art methods and consistently improves multiple self-attention baselines, e.g., surpassing DeiT-Tiny and ViT-Base by 2.8% and 2.4% mAP on MSMT17, respectively.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy
Authors:
Tong Zhang,
Congpei Qiu,
Wei Ke,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
Self-supervised learning (SSL) methods aim to learn view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image regardless of crop** size and content. In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diver…
▽ More
Self-supervised learning (SSL) methods aim to learn view-invariant representations by maximizing the similarity between the features extracted from different crops of the same image regardless of crop** size and content. In essence, this strategy ignores the fact that two crops may truly contain different image information, e.g., background and small objects, and thus tends to restrain the diversity of the learned representations. In this work, we address this issue by introducing a new self-supervised learning strategy, LoGo, that explicitly reasons about Local and Global crops. To achieve view invariance, LoGo encourages similarity between global crops from the same image, as well as between a global and a local crop. However, to correctly encode the fact that the content of smaller crops may differ entirely, LoGo promotes two local crops to have dissimilar representations, while being close to global crops. Our LoGo strategy can easily be applied to existing SSL methods. Our extensive experiments on a variety of datasets and using different self-supervised learning frameworks validate its superiority over existing approaches. Noticeably, we achieve better results than supervised models on transfer learning when using only 1/10 of the data.
△ Less
Submitted 13 April, 2022; v1 submitted 31 March, 2022;
originally announced March 2022.
-
M2MRF: Many-to-Many Reassembly of Features for Tiny Lesion Segmentation in Fundus Images
Authors:
Qing Liu,
Haotian Liu,
Wei Ke,
Yixiong Liang
Abstract:
Feature reassembly is an essential component in modern CNN-based segmentation approaches, which includes feature downsampling and upsampling operators. Existing operators reassemble multiple features from a small predefined region into one for each target location independently. This may result in loss of spatial information, which could vanish activations caused by tiny lesions particularly when…
▽ More
Feature reassembly is an essential component in modern CNN-based segmentation approaches, which includes feature downsampling and upsampling operators. Existing operators reassemble multiple features from a small predefined region into one for each target location independently. This may result in loss of spatial information, which could vanish activations caused by tiny lesions particularly when they cluster together. In this paper, we propose a many-to-many reassembly of features (M2MRF). It reassembles features in a dimension-reduced feature space and simultaneously aggregates multiple features inside a large predefined region into multiple target features. In this way, long range spatial dependencies are captured to maintain activations on tiny lesions. Experimental results on two lesion segmentation benchmarks, i.e. DDR and IDRiD, show that (1) our M2MRF outperforms existing feature reassembly operators; (2) equipped with our M2MRF, the HRNetv2 is able to achieve significant better performance to CNN-based segmentation methods and competitive even better performance to two recent transformer-based segmentation methods. Our code is made publicly available at https://github.com/CVIU-CSU/M2MRF-Lesion-Segmentation.
△ Less
Submitted 2 December, 2021; v1 submitted 30 October, 2021;
originally announced November 2021.
-
Discovery-and-Selection: Towards Optimal Multiple Instance Learning for Weakly Supervised Object Detection
Authors:
Shiwei Zhang,
Wei Ke,
Lin Yang
Abstract:
Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learn object classifiers and estimate object locations under the supervision of image category labels. A major line of WSOD methods roots in multiple instance learning which regards images as bags of instances and selects positive instances from each bag to learn the detector. However, a grand challenge em…
▽ More
Weakly supervised object detection (WSOD) is a challenging task that requires simultaneously learn object classifiers and estimate object locations under the supervision of image category labels. A major line of WSOD methods roots in multiple instance learning which regards images as bags of instances and selects positive instances from each bag to learn the detector. However, a grand challenge emerges when the detector inclines to converge to discriminative parts of objects rather than the whole objects. In this paper, under the hypothesis that optimal solutions are included in local minima, we propose a discovery-and-selection approach fused with multiple instance learning (DS-MIL), which finds rich local minima and select optimal solution from multiple local minima. To implement DS-MIL, an attention module is proposed so that more context information can be captured by feature maps and more valuable proposals can be collected during training. With proposal candidates, a selection module is proposed to select informative instances for object detector. Experimental results on commonly used benchmarks show that our proposed DS-MIL approach can consistently improve the baselines, reporting state-of-the-art performance.
△ Less
Submitted 5 May, 2022; v1 submitted 18 October, 2021;
originally announced October 2021.
-
A Multiple Classifier Approach for Concatenate-Designed Neural Networks
Authors:
Ka-Hou Chan,
Sio-Kei Im,
Wei Ke
Abstract:
This article introduces a multiple classifier method to improve the performance of concatenate-designed neural networks, such as ResNet and DenseNet, with the purpose to alleviate the pressure on the final classifier. We give the design of the classifiers, which collects the features produced between the network sets, and present the constituent layers and the activation function for the classifie…
▽ More
This article introduces a multiple classifier method to improve the performance of concatenate-designed neural networks, such as ResNet and DenseNet, with the purpose to alleviate the pressure on the final classifier. We give the design of the classifiers, which collects the features produced between the network sets, and present the constituent layers and the activation function for the classifiers, to calculate the classification score of each classifier. We use the L2 normalization method to obtain the classifier score instead of the Softmax normalization. We also determine the conditions that can enhance convergence. As a result, the proposed classifiers are able to improve the accuracy in the experimental cases significantly, and show that the method not only has better performance than the original models, but also produces faster convergence. Moreover, our classifiers are general and can be applied to all classification related concatenate-designed network models.
△ Less
Submitted 13 January, 2021;
originally announced January 2021.
-
A General Method for Generating Discrete Orthogonal Matrices
Authors:
Ka-Hou Chan,
Wei Ke,
Sio-Kei Im
Abstract:
Discrete orthogonal matrices have several applications in information technology, such as in coding and cryptography. It is often challenging to generate discrete orthogonal matrices. A common approach widely in use is to discretize continuous orthogonal functions that have been discovered. The need of certain continuous functions is restrictive. To simplify the process while improving the efficie…
▽ More
Discrete orthogonal matrices have several applications in information technology, such as in coding and cryptography. It is often challenging to generate discrete orthogonal matrices. A common approach widely in use is to discretize continuous orthogonal functions that have been discovered. The need of certain continuous functions is restrictive. To simplify the process while improving the efficiency and flexibility, we present a general method for generating orthogonal matrices directly through the construction of certain even and odd polynomials from a set of distinct positive values, bypassing the need of continuous orthogonal functions. We provide a constructive proof by induction that not only asserts the existence of such polynomials, but also tells how to iteratively construct them. Besides the derivation of the method as simple as a few nested loops, we discuss two well-known discrete transforms, the Discrete Cosine Transform and the Discrete Tchebichef Transform. How they can be achieved using our method with the specific values, and show how to embed them into the transform module of video coding. By the same token, we also show some examples of how to generate new orthogonal matrices from arbitrarily chosen values.
△ Less
Submitted 28 April, 2021; v1 submitted 13 January, 2021;
originally announced January 2021.
-
Evolution Features and Behavior Characters of Friendship Networks on Campus Life
Authors:
Zongkai Yang,
Zhu Su,
Sannyuya Liu,
Zhi Liu,
Wenxiang Ke,
Liang Zhao
Abstract:
Analyzing and mining students' behaviors and interactions from big data is an essential part of education data mining. Based on the data of campus smart cards, which include not only static demographic information but also dynamic behavioral data from more than 30000 anonymous students, in this paper, the evolution features of friendship and the relations between behavior characters and student in…
▽ More
Analyzing and mining students' behaviors and interactions from big data is an essential part of education data mining. Based on the data of campus smart cards, which include not only static demographic information but also dynamic behavioral data from more than 30000 anonymous students, in this paper, the evolution features of friendship and the relations between behavior characters and student interactions are investigated. On the one hand, four different evolving friendship networks are constructed by means of the friend ties proposed in this paper, which are extracted from monthly consumption records. In addition, the features of the giant connected components (GCCs) of friendship networks are analyzed via social network analysis (SNA) and percolation theory. On the other hand, two high-level behavior characters, orderliness and diligence, are adopted to analyze their associations with student interactions. Our experiment/empirical results indicate that the sizes of friendship networks have declined with time growth and both the small-world effect and power-law degree distribution are found in friendship networks. Second, the results of the assortativity coefficient of both orderliness and diligence verify that there are strong peer effects among students. Finally, the percolation analysis of orderliness on friendship networks shows that a phase transition exists, which is enlightening in that swarm intelligence can be realized by intervening the key students near the transition point.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.
-
Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning
Authors:
Zhekun Luo,
Devin Guillory,
Baifeng Shi,
Wei Ke,
Fang Wan,
Trevor Darrell,
Huijuan Xu
Abstract:
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bag's label is known, the main challenge is assigning which key instances within the bag to trigger t…
▽ More
Weakly-supervised action localization requires training a model to localize the action segments in the video given only video level action label. It can be solved under the Multiple Instance Learning (MIL) framework, where a bag (video) contains multiple instances (action segments). Since only the bag's label is known, the main challenge is assigning which key instances within the bag to trigger the bag's label. Most previous models use attention-based approaches applying attentions to generate the bag's representation from instances, and then train it via the bag's classification. These models, however, implicitly violate the MIL assumption that instances in negative bags should be uniformly negative. In this work, we explicitly model the key instances assignment as a hidden variable and adopt an Expectation-Maximization (EM) framework. We derive two pseudo-label generation schemes to model the E and M process and iteratively optimize the likelihood lower bound. We show that our EM-MIL approach more accurately models both the learning objective and the MIL assumptions. It achieves state-of-the-art performance on two standard benchmarks, THUMOS14 and ActivityNet1.2.
△ Less
Submitted 25 August, 2020; v1 submitted 31 March, 2020;
originally announced April 2020.
-
DLITE: The Discounted Least Information Theory of Entropy
Authors:
Weimao Ke
Abstract:
We propose an entropy-based information measure, namely the Discounted Least Information Theory of Entropy (DLITE), which not only exhibits important characteristics expected as an information measure but also satisfies conditions of a metric. Classic information measures such as Shannon Entropy, KL Divergence, and Jessen-Shannon Divergence have manifested some of these properties while missing ot…
▽ More
We propose an entropy-based information measure, namely the Discounted Least Information Theory of Entropy (DLITE), which not only exhibits important characteristics expected as an information measure but also satisfies conditions of a metric. Classic information measures such as Shannon Entropy, KL Divergence, and Jessen-Shannon Divergence have manifested some of these properties while missing others. This work fills an important gap in the advancement of information theory and its application, where related properties are desirable.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Multiple Anchor Learning for Visual Object Detection
Authors:
Wei Ke,
Tianliang Zhang,
Zeyi Huang,
Qixiang Ye,
Jianzhuang Liu,
Dong Huang
Abstract:
Classification and localization are two pillars of visual object detectors. However, in CNN-based detectors, these two modules are usually optimized under a fixed set of candidate (or anchor) bounding boxes. This configuration significantly limits the possibility to jointly optimize classification and localization. In this paper, we propose a Multiple Instance Learning (MIL) approach that selects…
▽ More
Classification and localization are two pillars of visual object detectors. However, in CNN-based detectors, these two modules are usually optimized under a fixed set of candidate (or anchor) bounding boxes. This configuration significantly limits the possibility to jointly optimize classification and localization. In this paper, we propose a Multiple Instance Learning (MIL) approach that selects anchors and jointly optimizes the two modules of a CNN-based object detector. Our approach, referred to as Multiple Anchor Learning (MAL), constructs anchor bags and selects the most representative anchors from each bag. Such an iterative selection process is potentially NP-hard to optimize. To address this issue, we solve MAL by repetitively depressing the confidence of selected anchors by perturbing their corresponding features. In an adversarial selection-depression manner, MAL not only pursues optimal solutions but also fully leverages multiple anchors/features to learn a detection model. Experiments show that MAL improves the baseline RetinaNet with significant margins on the commonly used MS-COCO object detection benchmark and achieves new state-of-the-art detection performance compared with recent methods.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
DDNet: Cartesian-polar Dual-domain Network for the Joint Optic Disc and Cup Segmentation
Authors:
Qing Liu,
Xiaopeng Hong,
Wei Ke,
Zailiang Chen,
Beiji Zou
Abstract:
Existing joint optic disc and cup segmentation approaches are developed either in Cartesian or polar coordinate system. However, due to the subtle optic cup, the contextual information exploited from the single domain even by the prevailing CNNs is still insufficient. In this paper, we propose a novel segmentation approach, named Cartesian-polar dual-domain network (DDNet), which for the first tim…
▽ More
Existing joint optic disc and cup segmentation approaches are developed either in Cartesian or polar coordinate system. However, due to the subtle optic cup, the contextual information exploited from the single domain even by the prevailing CNNs is still insufficient. In this paper, we propose a novel segmentation approach, named Cartesian-polar dual-domain network (DDNet), which for the first time considers the complementary of the Cartesian domain and the polar domain. We propose a two-branch of domain feature encoder and learn translation equivariant representations on rectilinear grid from Cartesian domain and rotation equivariant representations on polar grid from polar domain parallelly. To fuse the features on two different grids, we propose a dual-domain fusion module. This module builds the correspondence between two grids by the differentiable polar transform layer and learns the feature importance across two domains in element-wise to enhance the expressive capability. Finally, the decoder aggregates the fused features from low-level to high-level and makes dense predictions. We validate the state-of-the-art segmentation performances of our DDNet on the public dataset ORIGA. According to the segmentation masks, we estimate the commonly used clinical measure for glaucoma, i.e., the vertical cup-to-disc ratio. The low cup-to-disc ratio estimation error demonstrates the potential application in glaucoma screening.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection
Authors:
Fang Wan,
Chang Liu,
Wei Ke,
Xiangyang Ji,
Jianbin Jiao,
Qixiang Ye
Abstract:
Weakly supervised object detection (WSOD) is a challenging task when provided with image category supervision but required to simultaneously learn object locations and object detectors. Many WSOD approaches adopt multiple instance learning (MIL) and have non-convex loss functions which are prone to get stuck into local minima (falsely localize object parts) while missing full object extent during…
▽ More
Weakly supervised object detection (WSOD) is a challenging task when provided with image category supervision but required to simultaneously learn object locations and object detectors. Many WSOD approaches adopt multiple instance learning (MIL) and have non-convex loss functions which are prone to get stuck into local minima (falsely localize object parts) while missing full object extent during training. In this paper, we introduce a continuation optimization method into MIL and thereby creating continuation multiple instance learning (C-MIL), with the intention of alleviating the non-convexity problem in a systematic way. We partition instances into spatially related and class related subsets, and approximate the original loss function with a series of smoothed loss functions defined within the subsets. Optimizing smoothed loss functions prevents the training procedure falling prematurely into local minima and facilitates the discovery of Stable Semantic Extremal Regions (SSERs) which indicate full object extent. On the PASCAL VOC 2007 and 2012 datasets, C-MIL improves the state-of-the-art of weakly supervised object detection and weakly supervised object localization with large margins.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Improving Object Detection with Inverted Attention
Authors:
Zeyi Huang,
Wei Ke,
Dong Huang
Abstract:
Improving object detectors against occlusion, blur and noise is a critical step to deploy detectors in real applications. Since it is not possible to exhaust all image defects through data collection, many researchers seek to generate hard samples in training. The generated hard samples are either images or feature maps with coarse patches dropped out in the spatial dimensions. Significant overhea…
▽ More
Improving object detectors against occlusion, blur and noise is a critical step to deploy detectors in real applications. Since it is not possible to exhaust all image defects through data collection, many researchers seek to generate hard samples in training. The generated hard samples are either images or feature maps with coarse patches dropped out in the spatial dimensions. Significant overheads are required in training the extra hard samples and/or estimating drop-out patches using extra network branches. In this paper, we improve object detectors using a highly efficient and fine-grain mechanism called Inverted Attention (IA). Different from the original detector network that only focuses on the dominant part of objects, the detector network with IA iteratively inverts attention on feature maps and puts more attention on complementary object parts, feature channels and even context. Our approach (1) operates along both the spatial and channels dimensions of the feature maps; (2) requires no extra training on hard samples, no extra network parameters for attention estimation, and no testing overheads. Experiments show that our approach consistently improved both two-stage and single-stage detectors on benchmark databases.
△ Less
Submitted 28 March, 2019;
originally announced March 2019.
-
Automated Prototype Generation from Formal Requirements Model
Authors:
Yilong Yang,
Xiaoshan Li,
Zhiming Liu,
Wei Ke,
Quan Zu,
Xiaohong Chen
Abstract:
Prototy** is an effective and efficient way of requirement validation to avoid introducing errors in the early stage of software development. However, manually develo** a prototype of a software system requires additional efforts, which would increase the overall cost of software development. In this paper, we present an approach with a developed tool to automatic generation of prototypes from…
▽ More
Prototy** is an effective and efficient way of requirement validation to avoid introducing errors in the early stage of software development. However, manually develo** a prototype of a software system requires additional efforts, which would increase the overall cost of software development. In this paper, we present an approach with a developed tool to automatic generation of prototypes from formal requirements models. A requirements model consists of a use case diagram, a conceptual class diagram, use case definitions specified by system sequence diagrams and the contracts of their system operations. We propose a method to decompose a contract into executable parts and non-executable parts. A set of transformation rules is given to decompose the executable part into pre-implemented primitive operations. A non-executable part is usually realized by significant algorithms such as sorting a list, finding the shortest path or domain-specific computation. It can be implemented manually or by using existing code. A CASE tool is developed that provides an interface for developers to develop a program for each non-executable part of a contract, and automatically transforms the executables into sequences of pre-implemented primitive operations. We have conducted four cases studies with over 50 use cases. The experimental result shows that the 93.65% of requirement specifications are executable, and only 6.35% are non-executable such as sorting and event-call, which can be implemented by developers manually or invoking the APIs of advanced algorithms in Java library. The one second generated the prototype of a case study requires approximate nine hours manual implementation by a skilled programmer. Overall, the result is satisfiable, and the proposed approach with the developed CASE tool can be applied to the software industry for requirements engineering.
△ Less
Submitted 31 August, 2018;
originally announced August 2018.
-
Linear Span Network for Object Skeleton Detection
Authors:
Chang Liu,
Wei Ke,
Fei Qin,
Qixiang Ye
Abstract:
Robust object skeleton detection requires to explore rich representative visual features and effective feature fusion strategies. In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span…
▽ More
Robust object skeleton detection requires to explore rich representative visual features and effective feature fusion strategies. In this paper, we first re-visit the implementation of HED, the essential principle of which can be ideally described with a linear reconstruction model. Hinted by this, we formalize a Linear Span framework, and propose Linear Span Network (LSN) modified by Linear Span Units (LSUs), which minimize the reconstruction error of convolutional network. LSN further utilizes subspace linear span beside the feature linear span to increase the independence of convolutional features and the efficiency of feature integration, which enlarges the capability of fitting complex ground-truth. As a result, LSN can effectively suppress the cluttered backgrounds and reconstruct object skeletons. Experimental results validate the state-of-the-art performance of the proposed LSN.
△ Less
Submitted 25 July, 2018;
originally announced July 2018.
-
SRN: Side-output Residual Network for Object Reflection Symmetry Detection and Beyond
Authors:
Wei Ke,
Jie Chen,
Jianbin Jiao,
Guoying Zhao,
Qixiang Ye
Abstract:
In this paper, we establish a baseline for object reflection symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far b…
▽ More
In this paper, we establish a baseline for object reflection symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The end-to-end deep learning approach, referred to as a side-output residual network (SRN), leverages the output residual units (RUs) to fit the errors between the object ground-truth symmetry and the side-outputs of multiple stages. By cascading RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple stages to address the challenges of fitting complex output with limited convolutional layers, suppressing the complex backgrounds, and effectively matching object symmetry at different scales. SRN is further upgraded to a multi-task side-output residual network (MT-SRN) for joint symmetry and edge detection, demonstrating its generality to image-to-mask learning tasks. Experimental results validate both the challenging aspects of Sym-PASCAL benchmark related to real-world images and the state-of-the-art performance of the proposed SRN approach.
△ Less
Submitted 15 March, 2019; v1 submitted 17 July, 2018;
originally announced July 2018.
-
MedShare: Medical Resource Sharing among Autonomous Healthcare Providers
Authors:
Yilong Yang,
Xiaoshan Li,
Nafees Qamar,
Wei Ke,
Zhiming Liu
Abstract:
Legacy Electronic Health Records (EHRs) systems were not developed with the level of connectivity expected from them nowadays. Therefore, interoperability weakness inherent in the legacy systems can result in poor patient care and waste of financial resources. Large hospitals are less likely to share their data with external hospitals due to economic and political reasons. Motivated by these facts…
▽ More
Legacy Electronic Health Records (EHRs) systems were not developed with the level of connectivity expected from them nowadays. Therefore, interoperability weakness inherent in the legacy systems can result in poor patient care and waste of financial resources. Large hospitals are less likely to share their data with external hospitals due to economic and political reasons. Motivated by these facts, we aim to provide a set of software implementation guidelines, i.e., MedShare to deal with interoperability issues among disconnected healthcare systems. The proposed integrated architecture includes: 1) a data extractor to fetch legacy medical data from a hemodialysis center, 2) converting it to a common data model, 3) indexing patient information using the HashMap technique, and 4) a set of services and tools that can be installed as a coherent environment on top of stand-alone EHRs systems. Our work enabled three cooperating but autonomous hospitals to mutually exchange medical data and helped them develop a common reference architecture. It lets stakeholders retain control over their patient data, winning the trust and confidence much needed towards a successful deployment of MedShare. Security concerns were effectively addressed that also included patient consent in the data exchange process. Thereby, the implemented toolset offered a collaborative environment to share EHRs by the healthcare providers.
△ Less
Submitted 14 March, 2018;
originally announced March 2018.
-
Automatic Streaming Segmentation of Stereo Video Using Bilateral Space
Authors:
Wen**g Ke,
Yuanjie Zhu,
Lei Yu
Abstract:
In this paper, we take advantage of binocular camera and propose an unsupervised algorithm based on semi-supervised segmentation algorithm and extracting foreground part efficiently. We creatively embed depth information into bilateral grid in the graph cut model and achieve considerable segmenting accuracy in the case of no user input. The experi- ment approves the high precision, time efficiency…
▽ More
In this paper, we take advantage of binocular camera and propose an unsupervised algorithm based on semi-supervised segmentation algorithm and extracting foreground part efficiently. We creatively embed depth information into bilateral grid in the graph cut model and achieve considerable segmenting accuracy in the case of no user input. The experi- ment approves the high precision, time efficiency of our algorithm and its adaptation to complex natural scenario which is significant for practical application.
△ Less
Submitted 27 November, 2017; v1 submitted 10 October, 2017;
originally announced October 2017.
-
SRN: Side-output Residual Network for Object Symmetry Detection in the Wild
Authors:
Wei Ke,
Jie Chen,
Jianbin Jiao,
Guoying Zhao,
Qixiang Ye
Abstract:
In this paper, we establish a baseline for object symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, named Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond…
▽ More
In this paper, we establish a baseline for object symmetry detection in complex backgrounds by presenting a new benchmark and an end-to-end deep learning approach, opening up a promising direction for symmetry detection in the wild. The new benchmark, named Sym-PASCAL, spans challenges including object diversity, multi-objects, part-invisibility, and various complex backgrounds that are far beyond those in existing datasets. The proposed symmetry detection approach, named Side-output Residual Network (SRN), leverages output Residual Units (RUs) to fit the errors between the object symmetry groundtruth and the outputs of RUs. By stacking RUs in a deep-to-shallow manner, SRN exploits the 'flow' of errors among multiple scales to ease the problems of fitting complex outputs with limited layers, suppressing the complex backgrounds, and effectively matching object symmetry of different scales. Experimental results validate both the benchmark and its challenging aspects related to realworld images, and the state-of-the-art performance of our symmetry detection approach. The benchmark and the code for SRN are publicly available at https://github.com/KevinKecc/SRN.
△ Less
Submitted 31 March, 2017; v1 submitted 7 March, 2017;
originally announced March 2017.
-
A Fitness Model for Scholarly Impact Analysis
Authors:
Weimao Ke
Abstract:
We propose a model to analyze citation growth and influences of fitness (competitiveness) factors in an evolving citation network. Applying the proposed method to modeling citations to papers and scholars in the InfoVis 2004 data, a benchmark collection about a 31-year history of information visualization, leads to findings consistent with citation distributions in general and observations of the…
▽ More
We propose a model to analyze citation growth and influences of fitness (competitiveness) factors in an evolving citation network. Applying the proposed method to modeling citations to papers and scholars in the InfoVis 2004 data, a benchmark collection about a 31-year history of information visualization, leads to findings consistent with citation distributions in general and observations of the domain in particular. Fitness variables based on prior impacts and the time factor have significant influences on citation outcomes. We find considerably large effect sizes from the fitness modeling, which suggest inevitable bias in citation analysis due to these factors. While raw citation scores offer little insight into the growth of InfoVis, normalization of the scores by influences of time and prior fitness offers a reasonable depiction of the field's development. The analysis demonstrates the proposed model's ability to produce results consistent with observed data and to support meaningful comparison of citation scores over time.
△ Less
Submitted 2 May, 2012;
originally announced May 2012.
-
Least Information Modeling for Information Retrieval
Authors:
Weimao Ke
Abstract:
We proposed a Least Information theory (LIT) to quantify meaning of information in probability distribution changes, from which a new information retrieval model was developed. We observed several important characteristics of the proposed theory and derived two quantities in the IR context for document representation. Given probability distributions in a collection as prior knowledge, LI Binary (L…
▽ More
We proposed a Least Information theory (LIT) to quantify meaning of information in probability distribution changes, from which a new information retrieval model was developed. We observed several important characteristics of the proposed theory and derived two quantities in the IR context for document representation. Given probability distributions in a collection as prior knowledge, LI Binary (LIB) quantifies least information due to the binary occurrence of a term in a document whereas LI Frequency (LIF) measures least information based on the probability of drawing a term from a bag of words. Three fusion methods were also developed to combine LIB and LIF quantities for term weighting and document ranking. Experiments on four benchmark TREC collections for ad hoc retrieval showed that LIT-based methods demonstrated very strong performances compared to classic TF*IDF and BM25, especially for verbose queries and hard search topics. The least information theory offers a new approach to measuring semantic quantities of information and provides valuable insight into the development of new IR models.
△ Less
Submitted 1 May, 2012;
originally announced May 2012.