-
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Authors:
Tianqi Xu,
Linyao Chen,
Dai-Jie Wu,
Yanjun Chen,
Zecheng Zhang,
Xiang Yao,
Zhiqiang Xie,
Yongchao Chen,
Shilong Liu,
Bochen Qian,
Philip Torr,
Bernard Ghanem,
Guohao Li
Abstract:
The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl…
▽ More
The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 100 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.26%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Generalized and high-efficiency arbitrary-positioned buffer for smoothed particle hydrodynamics
Authors:
Shuoguo Zhang,
Yu Fan,
Yaru Ren,
Bin Qian,
Xiangyu Hu
Abstract:
This paper develops an arbitrary-positioned buffer for the smoothed particle hydrodynamics (SPH) method, whose generality and high efficiency are achieved through two techniques. First, with the local coordinate system established at each arbitrary-positioned in-/outlet, particle positions in the global coordinate system are transformed into those in it via coordinate transformation. Since one loc…
▽ More
This paper develops an arbitrary-positioned buffer for the smoothed particle hydrodynamics (SPH) method, whose generality and high efficiency are achieved through two techniques. First, with the local coordinate system established at each arbitrary-positioned in-/outlet, particle positions in the global coordinate system are transformed into those in it via coordinate transformation. Since one local axis is located perpendicular to the in-/outlet boundary, the position comparison between particles and the threshold line or surface can be simplified to just this coordinate dimension. Second, particle candidates subjected to position comparison at one specific in-/outlet are restricted to those within the local cell-linked lists nearby the defined buffer zone, which significantly enhances computational efficiency due to a small portion of particles being checked. With this developed buffer, particle generation and deletion at arbitrary-positioned in- and outlets of complex flows can be handled efficiently and straightforwardly. We validate the effectiveness and versatility of the developed buffer through 2-D and 3-D non-/orthogonal uni-/bidirectional flows with arbitrary-positioned in- and outlets, driven by either pressure or velocity boundary conditions.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
Authors:
Teng Hu,
Ran Yi,
Baihong Qian,
Jiangning Zhang,
Paul L. Rosin,
Yu-Kun Lai
Abstract:
SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add…
▽ More
SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To address this issue, we propose SuperSVG, a superpixel-based vectorization model that achieves fast and high-precision image vectorization. Specifically, we decompose the input image into superpixels to help the model focus on areas with similar colors and textures. Then, we propose a two-stage self-training framework, where a coarse-stage model is employed to reconstruct the main structure and a refinement-stage model is used for enriching the details. Moreover, we propose a novel dynamic path war** loss to help the refinement-stage model to inherit knowledge from the coarse-stage model. Extensive qualitative and quantitative experiments demonstrate the superior performance of our method in terms of reconstruction accuracy and inference time compared to state-of-the-art approaches. The code is available in \url{https://github.com/sjtuplayer/SuperSVG}.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Authors:
Haipeng Liu,
Yang Wang,
Biao Qian,
Meng Wang,
Yong Rui
Abstract:
Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process. Despite the meaningful semantics generation, the existing arts suffer from the semantic discrepancy between masked and unmasked regions, since the semantically dense unma…
▽ More
Denoising diffusion probabilistic models for image inpainting aim to add the noise to the texture of image during the forward process and recover masked regions with unmasked ones of the texture via the reverse denoising process. Despite the meaningful semantics generation, the existing arts suffer from the semantic discrepancy between masked and unmasked regions, since the semantically dense unmasked texture fails to be completely degraded while the masked regions turn to the pure noise in diffusion process, leading to the large discrepancy between them. In this paper, we aim to answer how unmasked semantics guide texture denoising process;together with how to tackle the semantic discrepancy, to facilitate the consistent and meaningful semantics generation. To this end, we propose a novel structure-guided diffusion model named StrDiffusion, to reformulate the conventional texture denoising process under structure guidance to derive a simplified denoising objective for image inpainting, while revealing: 1) the semantically sparse structure is beneficial to tackle semantic discrepancy in early stage, while dense texture generates reasonable semantics in late stage; 2) the semantics from unmasked regions essentially offer the time-dependent structure guidance for the texture denoising process, benefiting from the time-dependent sparsity of the structure semantics. For the denoising process, a structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of the denoised structure between masked and unmasked regions. Besides, we devise an adaptive resampling strategy as a formal criterion as whether structure is competent to guide the texture denoising process, while regulate their semantic correlations. Extensive experiments validate the merits of StrDiffusion over the state-of-the-arts. Our code is available at https://github.com/htyjers/StrDiffusion.
△ Less
Submitted 31 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
AgentScope: A Flexible yet Robust Multi-Agent Platform
Authors:
Dawei Gao,
Zitao Li,
Xuchen Pan,
Weirui Kuang,
Zhijian Ma,
Bingchen Qian,
Fei Wei,
Wenhao Zhang,
Yuexiang Xie,
Daoyuan Chen,
Liuyi Yao,
Hongyi Peng,
Zeyu Zhang,
Lin Zhu,
Chen Cheng,
Hongzhu Shi,
Yaliang Li,
Bolin Ding,
**gren Zhou
Abstract:
With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in develo** robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with me…
▽ More
With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications. However, the complexities in coordinating agents' cooperation and LLMs' erratic performance pose notable challenges in develo** robust and efficient multi-agent applications. To tackle these challenges, we propose AgentScope, a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment. Towards robust and flexible multi-agent application, AgentScope provides both built-in and customizable fault tolerance mechanisms. At the same time, it is also armed with system-level support for managing and utilizing multi-modal data, tools, and external knowledge. Additionally, we design an actor-based distribution framework, enabling easy conversion between local and distributed deployments and automatic parallel optimization without extra effort. With these features, AgentScope empowers developers to build applications that fully realize the potential of intelligent agents. We have released AgentScope at https://github.com/modelscope/agentscope, and hope AgentScope invites wider participation and innovation in this fast-moving field.
△ Less
Submitted 20 May, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Federated Fine-tuning of Large Language Models under Heterogeneous Tasks and Client Resources
Authors:
Jiamu Bai,
Daoyuan Chen,
Bingchen Qian,
Liuyi Yao,
Yaliang Li
Abstract:
Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that re…
▽ More
Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients. This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the ``bucket effect'' in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving thousands of clients performing heterogeneous NLP tasks and client resources, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving consistently better improvement over SOTA FL methods in downstream NLP task performance across various heterogeneous distributions. FlexLoRA's practicality is further underscored by our theoretical analysis and its seamless integration with existing LoRA-based FL methods, offering a path toward cross-device, privacy-preserving federated tuning for LLMs.
△ Less
Submitted 30 May, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
RHOBIN Challenge: Reconstruction of Human Object Interaction
Authors:
Xianghui Xie,
Xi Wang,
Nikos Athanasiou,
Bharat Lal Bhatnagar,
Chun-Hao P. Huang,
Kaichun Mo,
Hao Chen,
Xia Jia,
Zerui Zhang,
Liangxian Cui,
Xiao Lin,
Bingqiao Qian,
Jie Xiao,
Wenfei Yang,
Hyeong** Nam,
Daniel Sungho Jung,
Kihoon Kim,
Kyoung Mu Lee,
Otmar Hilliges,
Gerard Pons-Moll
Abstract:
Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear…
▽ More
Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Authors:
Zhen Qin,
Daoyuan Chen,
Bingchen Qian,
Bolin Ding,
Yaliang Li,
Shuiguang Deng
Abstract:
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height poss…
▽ More
Pre-trained large language models (LLMs) need fine-tuning to improve their responsiveness to natural language instructions. Federated learning offers a way to fine-tune LLMs using the abundant data on end devices without compromising data privacy. Most existing federated fine-tuning methods for LLMs rely on parameter-efficient fine-tuning techniques, which may not reach the performance height possible with full-parameter tuning. However, federated full-parameter tuning of LLMs is a non-trivial problem due to the immense communication cost. This work introduces FedKSeed that employs zeroth-order optimization with a finite set of random seeds. It significantly reduces transmission requirements between the server and clients to just a few random seeds and scalar gradients, amounting to only a few thousand bytes, making federated full-parameter tuning of billion-sized LLMs possible on devices. Building on it, we develop a strategy enabling probability-differentiated seed sampling, prioritizing perturbations with greater impact on model accuracy. Experiments across six scenarios with various LLMs, datasets and data partitions demonstrate that our approach outperforms existing federated LLM fine-tuning methods in both communication efficiency and new task generalization.
△ Less
Submitted 27 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
Authors:
Weirui Kuang,
Bingchen Qian,
Zitao Li,
Daoyuan Chen,
Dawei Gao,
Xuchen Pan,
Yuexiang Xie,
Yaliang Li,
Bolin Ding,
**gren Zhou
Abstract:
LLMs have demonstrated great capabilities in various NLP tasks. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their data cannot be shared because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different en…
▽ More
LLMs have demonstrated great capabilities in various NLP tasks. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their data cannot be shared because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. However, fine-tuning LLMs in federated learning settings still lacks adequate support from existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution, which consists of the following components: (1) we build an end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution, and performance evaluation on federated LLM fine-tuning; (2) we provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios with low communication and computation costs, even without accessing the full model; (3) we adopt several accelerating and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings, which also yields valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
Authors:
Xiangru Tang,
Bill Qian,
Rick Gao,
Jiakang Chen,
Xinyun Chen,
Mark Gerstein
Abstract:
Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benc…
▽ More
Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1,026 Python functions and 1,243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT- 4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our training dataset can enhance the performance on our testing benchmark (by >15% in terms of Pass@K under certain prompt configurations and always >3%). The results highlight two key aspects of successful models: (1) Successful models accommodate a long prompt (> 2,600 tokens) with full context, including functional dependencies. (2) They contain domain-specific knowledge of bioinformatics, beyond just general coding capability. This is evident from the performance gain of GPT-3.5/4 compared to the smaller models on our benchmark (50% vs. up to 25%). Availability and implementation: Code is available at: https://github.com/gersteinlab/biocoder and https://biocoder-benchmark. github.io/.
△ Less
Submitted 20 May, 2024; v1 submitted 31 August, 2023;
originally announced August 2023.
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Authors:
Yujia Qin,
Shihao Liang,
Yining Ye,
Kunlun Zhu,
Lan Yan,
Yaxi Lu,
Yankai Lin,
Xin Cong,
Xiangru Tang,
Bill Qian,
Sihan Zhao,
Lauren Hong,
Runchu Tian,
Ruobing Xie,
Jie Zhou,
Mark Gerstein,
Dahai Li,
Zhiyuan Liu,
Maosong Sun
Abstract:
Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-th…
▽ More
Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is constructed automatically using ChatGPT. Specifically, the construction can be divided into three stages: (i) API collection: we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub; (ii) instruction generation: we prompt ChatGPT to generate diverse instructions involving these APIs, covering both single-tool and multi-tool scenarios; (iii) solution path annotation: we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To enhance the reasoning capabilities of LLMs, we develop a novel depth-first search-based decision tree algorithm. It enables LLMs to evaluate multiple reasoning traces and expand the search space. Moreover, to evaluate the tool-use capabilities of LLMs, we develop an automatic evaluator: ToolEval. Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it with a neural API retriever to recommend appropriate APIs for each instruction. Experiments show that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. Our ToolLLaMA also demonstrates strong zero-shot generalization ability in an out-of-distribution tool-use dataset: APIBench.
△ Less
Submitted 3 October, 2023; v1 submitted 31 July, 2023;
originally announced July 2023.
-
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images
Authors:
Bo Qian,
Hao Chen,
Xiangning Wang,
Haoxuan Che,
Gitaek Kwon,
Jaeyoung Kim,
Sung** Choi,
Seoyoung Shin,
Felix Krause,
Markus Unterdechler,
Junlin Hou,
Rui Feng,
Yihao Li,
Mostafa El Habib Daho,
Qiang Wu,
** Zhang,
Xiaokang Yang,
Yiyu Cai,
Wei** Jia,
Huating Li,
Bin Sheng
Abstract:
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s…
▽ More
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Adaptive Data-Free Quantization
Authors:
Biao Qian,
Yang Wang,
Richang Hong,
Meng Wang
Abstract:
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the original data, but generates the fake sample via a generator (G) by learning from full-precision network (P), which, however, is totally independent of Q, overlooking the adaptability of the knowledge from generated samples, i.e., informative or not to the learning process of Q, resulting into the overflow o…
▽ More
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without the original data, but generates the fake sample via a generator (G) by learning from full-precision network (P), which, however, is totally independent of Q, overlooking the adaptability of the knowledge from generated samples, i.e., informative or not to the learning process of Q, resulting into the overflow of generalization error. Building on this, several critical questions -- how to measure the sample adaptability to Q under varied bit-width scenarios? whether the largest adaptability is the best? how to generate the samples with adaptive adaptability to improve Q's generalization? To answer the above questions, in this paper, we propose an Adaptive Data-Free Quantization (AdaDFQ) method, which revisits DFQ from a zero-sum game perspective upon the sample adaptability between two players -- a generator and a quantized network. Following this viewpoint, we further define the disagreement and agreement samples to form two boundaries, where the margin is optimized to adaptively regulate the adaptability of generated samples to Q, so as to address the over-and-under fitting issues. Our AdaDFQ reveals: 1) the largest adaptability is NOT the best for sample generation to benefit Q's generalization; 2) the knowledge of the generated sample should not be informative to Q only, but also related to the category and distribution information of the training data for P. The theoretical and empirical analysis validate the advantages of AdaDFQ over the state-of-the-arts. Our code is available at https://github.com/hfutqian/AdaDFQ.
△ Less
Submitted 20 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Rethinking Data-Free Quantization as a Zero-Sum Game
Authors:
Biao Qian,
Yang Wang,
Richang Hong,
Meng Wang
Abstract:
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without accessing the real data, but generates the fake sample via a generator (G) by learning from full-precision network (P) instead. However, such sample generation process is totally independent of Q, specialized as failing to consider the adaptability of the generated samples, i.e., beneficial or adversarial, over…
▽ More
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without accessing the real data, but generates the fake sample via a generator (G) by learning from full-precision network (P) instead. However, such sample generation process is totally independent of Q, specialized as failing to consider the adaptability of the generated samples, i.e., beneficial or adversarial, over the learning process of Q, resulting into non-ignorable performance loss. Building on this, several crucial questions -- how to measure and exploit the sample adaptability to Q under varied bit-width scenarios? how to generate the samples with desirable adaptability to benefit the quantized network? -- impel us to revisit DFQ. In this paper, we answer the above questions from a game-theory perspective to specialize DFQ as a zero-sum game between two players -- a generator and a quantized network, and further propose an Adaptability-aware Sample Generation (AdaSG) method. Technically, AdaSG reformulates DFQ as a dynamic maximization-vs-minimization game process anchored on the sample adaptability. The maximization process aims to generate the sample with desirable adaptability, such sample adaptability is further reduced by the minimization process after calibrating Q for performance recovery. The Balance Gap is defined to guide the stationarity of the game process to maximally benefit Q. The theoretical analysis and empirical studies verify the superiority of AdaSG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/AdaSG.
△ Less
Submitted 19 February, 2023;
originally announced February 2023.
-
Fine-grained Cross-modal Fusion based Refinement for Text-to-Image Synthesis
Authors:
Haoran Sun,
Yang Wang,
Haipeng Liu,
Biao Qian
Abstract:
Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text descr…
▽ More
Text-to-image synthesis refers to generating visual-realistic and semantically consistent images from given textual descriptions. Previous approaches generate an initial low-resolution image and then refine it to be high-resolution. Despite the remarkable progress, these methods are limited in fully utilizing the given texts and could generate text-mismatched images, especially when the text description is complex. We propose a novel Fine-grained text-image Fusion based Generative Adversarial Networks, dubbed FF-GAN, which consists of two modules: Fine-grained text-image Fusion Block (FF-Block) and Global Semantic Refinement (GSR). The proposed FF-Block integrates an attention block and several convolution layers to effectively fuse the fine-grained word-context features into the corresponding visual features, in which the text information is fully used to refine the initial image with more details. And the GSR is proposed to improve the global semantic consistency between linguistic and visual features during the refinement process. Extensive experiments on CUB-200 and COCO datasets demonstrate the superiority of FF-GAN over other state-of-the-art approaches in generating images with semantic consistency to the given texts.Code is available at https://github.com/haoranhfut/FF-GAN.
△ Less
Submitted 20 February, 2023; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Switchable Online Knowledge Distillation
Authors:
Biao Qian,
Yang Wang,
Hongzhi Yin,
Richang Hong,
Meng Wang
Abstract:
Online Knowledge Distillation (OKD) improves the involved models by reciprocally exploiting the difference between teacher and student. Several crucial bottlenecks over the gap between them -- e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? -- have received limited formal study. In this paper, we propose Switcha…
▽ More
Online Knowledge Distillation (OKD) improves the involved models by reciprocally exploiting the difference between teacher and student. Several crucial bottlenecks over the gap between them -- e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? -- have received limited formal study. In this paper, we propose Switchable Online Knowledge Distillation (SwitOKD), to answer these questions. Instead of focusing on the accuracy gap at test phase by the existing arts, the core idea of SwitOKD is to adaptively calibrate the gap at training phase, namely distillation gap, via a switching strategy between two modes -- expert mode (pause the teacher while keep the student learning) and learning mode (restart the teacher). To possess an appropriate distillation gap, we further devise an adaptive switching threshold, which provides a formal criterion as to when to switch to learning mode or expert mode, and thus improves the student's performance. Meanwhile, the teacher benefits from our adaptive switching threshold and keeps basically on a par with other online arts. We further extend SwitOKD to multiple networks with two basis topologies. Finally, extensive experiments and analysis validate the merits of SwitOKD for classification over the state-of-the-arts. Our code is available at https://github.com/hfutqian/SwitOKD.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Covering Grassmannian Codes: Bounds and Constructions
Authors:
Bingchen Qian,
Xin Wang,
Chengfei Xie,
Gennian Ge
Abstract:
Grassmannian $\mathcal{G}_q(n,k)$ is the set of all $k$-dimensional subspaces of the vector space $\mathbb{F}_q^n.$ Recently, Etzion and Zhang introduced a new notion called covering Grassmannian code which can be used in network coding solutions for generalized combination networks. An $α$-$(n,k,δ)_q^c$ covering Grassmannian code $\mathcal{C}$ is a subset of $\mathcal{G}_q(n,k)$ such that every s…
▽ More
Grassmannian $\mathcal{G}_q(n,k)$ is the set of all $k$-dimensional subspaces of the vector space $\mathbb{F}_q^n.$ Recently, Etzion and Zhang introduced a new notion called covering Grassmannian code which can be used in network coding solutions for generalized combination networks. An $α$-$(n,k,δ)_q^c$ covering Grassmannian code $\mathcal{C}$ is a subset of $\mathcal{G}_q(n,k)$ such that every set of $α$ codewords of $\mathcal{C}$ spans a subspace of dimension at least $δ+k$ in $\mathbb{F}_q^n.$ In this paper, we derive new upper and lower bounds on the size of covering Grassmannian codes. These bounds improve and extend the parameter range of known bounds.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Improved Lower Bounds for Secure Codes and Related Structures
Authors:
Bingchen Qian,
Xin Wang,
Gennian Ge
Abstract:
Secure codes are widely-studied combinatorial structures which were introduced for traitor tracing in broadcast encryption. To determine the maximum size of such structures is the main research objective. In this paper, we investigate the lower bounds for secure codes and their related structures. First, we give some improved lower bounds for the rates of $2$-frameproof codes and $\overline{2}$-se…
▽ More
Secure codes are widely-studied combinatorial structures which were introduced for traitor tracing in broadcast encryption. To determine the maximum size of such structures is the main research objective. In this paper, we investigate the lower bounds for secure codes and their related structures. First, we give some improved lower bounds for the rates of $2$-frameproof codes and $\overline{2}$-separable codes for slightly large alphabet size. Then we improve the lower bounds for the rate of some related structures, i.e., strongly $2$-separable matrices and $2$-cancellative set families. Finally, we give a general method to derive new lower bounds for strongly $t$-separable matrices and $t$-cancellative set families for $t\ge 3.$
△ Less
Submitted 21 August, 2021; v1 submitted 18 August, 2021;
originally announced August 2021.
-
RECIST-Net: Lesion detection via grou** keypoints on RECIST-based annotation
Authors:
Cong Xie,
Shilei Cao,
Dong Wei,
Hongyu Zhou,
Kai Ma,
Xianli Zhang,
Buyue Qian,
Liansheng Wang,
Yefeng Zheng
Abstract:
Universal lesion detection in computed tomography (CT) images is an important yet challenging task due to the large variations in lesion type, size, shape, and appearance. Considering that data in clinical routine (such as the DeepLesion dataset) are usually annotated with a long and a short diameter according to the standard of Response Evaluation Criteria in Solid Tumors (RECIST) diameters, we p…
▽ More
Universal lesion detection in computed tomography (CT) images is an important yet challenging task due to the large variations in lesion type, size, shape, and appearance. Considering that data in clinical routine (such as the DeepLesion dataset) are usually annotated with a long and a short diameter according to the standard of Response Evaluation Criteria in Solid Tumors (RECIST) diameters, we propose RECIST-Net, a new approach to lesion detection in which the four extreme points and center point of the RECIST diameters are detected. By detecting a lesion as keypoints, we provide a more conceptually straightforward formulation for detection, and overcome several drawbacks (e.g., requiring extensive effort in designing data-appropriate anchors and losing shape information) of existing bounding-box-based methods while exploring a single-task, one-stage approach compared to other RECIST-based approaches. Experiments show that RECIST-Net achieves a sensitivity of 92.49% at four false positives per image, outperforming other recent methods including those using multi-task learning.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Online Disease Self-diagnosis with Inductive Heterogeneous Graph Convolutional Networks
Authors:
Zifeng Wang,
Rui Wen,
Xi Chen,
Shilei Cao,
Shao-Lun Huang,
Buyue Qian,
Yefeng Zheng
Abstract:
We propose a Healthcare Graph Convolutional Network (HealGCN) to offer disease self-diagnosis service for online users based on Electronic Healthcare Records (EHRs). Two main challenges are focused in this paper for online disease diagnosis: (1) serving cold-start users via graph convolutional networks and (2) handling scarce clinical description via a symptom retrieval system. To this end, we fir…
▽ More
We propose a Healthcare Graph Convolutional Network (HealGCN) to offer disease self-diagnosis service for online users based on Electronic Healthcare Records (EHRs). Two main challenges are focused in this paper for online disease diagnosis: (1) serving cold-start users via graph convolutional networks and (2) handling scarce clinical description via a symptom retrieval system. To this end, we first organize the EHR data into a heterogeneous graph that is capable of modeling complex interactions among users, symptoms and diseases, and tailor the graph representation learning towards disease diagnosis with an inductive learning paradigm. Then, we build a disease self-diagnosis system with a corresponding EHR Graph-based Symptom Retrieval System (GraphRet) that can search and provide a list of relevant alternative symptoms by tracing the predefined meta-paths. GraphRet helps enrich the seed symptom set through the EHR graph when confronting users with scarce descriptions, hence yield better diagnosis accuracy. At last, we validate the superiority of our model on a large-scale EHR dataset.
△ Less
Submitted 12 February, 2021; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Train Your Data Processor: Distribution-Aware and Error-Compensation Coordinate Decoding for Human Pose Estimation
Authors:
Feiyu Yang,
Zhan Song,
Zhenzhong Xiao,
Yu Chen,
Zhe Pan,
Min Zhang,
Min Xue,
Yaoyang Mo,
Yao Zhang,
Guoxiong Guan,
Beibei Qian
Abstract:
Recently, the leading performance of human pose estimation is dominated by heatmap based methods. While being a fundamental component of heatmap processing, heatmap decoding (i.e. transforming heatmaps to coordinates) receives only limited investigations, to our best knowledge. This work fills the gap by studying the heatmap decoding processing with a particular focus on the errors introduced thro…
▽ More
Recently, the leading performance of human pose estimation is dominated by heatmap based methods. While being a fundamental component of heatmap processing, heatmap decoding (i.e. transforming heatmaps to coordinates) receives only limited investigations, to our best knowledge. This work fills the gap by studying the heatmap decoding processing with a particular focus on the errors introduced throughout the prediction process. We found that the errors of heatmap based methods are surprisingly significant, which nevertheless was universally ignored before. In view of the discovered importance, we further reveal the intrinsic limitations of the previous widely used heatmap decoding methods and thereout propose a Distribution-Aware and Error-Compensation Coordinate Decoding (DAEC). Serving as a model-agnostic plug-in, DAEC learns its decoding strategy from training data and remarkably improves the performance of a variety of state-of-the-art human pose estimation models with negligible extra computation. Specifically, equipped with DAEC, the SimpleBaseline-ResNet152-256x192 and HRNet-W48-256x192 are significantly improved by 2.6 AP and 2.9 AP achieving 72.6 AP and 75.7 AP on COCO, respectively. Moreover, the HRNet-W32-256x256 and ResNet-152-256x256 frameworks enjoy even more dramatic promotions of 8.4% and 7.8% on MPII with PCKh0.1 metric. Extensive experiments performed on these two common benchmarks, demonstrates that DAEC exceeds its competitors by considerable margins, backing up the rationality and generality of our novel heatmap decoding idea. The project is available at https://github.com/fyang235/DAEC.
△ Less
Submitted 17 July, 2020; v1 submitted 11 July, 2020;
originally announced July 2020.
-
Diversifying Inference Path Selection: Moving-Mobile-Network for Landmark Recognition
Authors:
Biao Qian,
Yang Wang,
Zhao Zhang,
Richang Hong,
Meng Wang,
Ling Shao
Abstract:
Deep convolutional neural networks have largely benefited computer vision tasks. However, the high computational complexity limits their real-world applications. To this end, many methods have been proposed for efficient network learning, and applications in portable mobile devices. In this paper, we propose a novel \underline{M}oving-\underline{M}obile-\underline{Net}work, named M$^2$Net, for lan…
▽ More
Deep convolutional neural networks have largely benefited computer vision tasks. However, the high computational complexity limits their real-world applications. To this end, many methods have been proposed for efficient network learning, and applications in portable mobile devices. In this paper, we propose a novel \underline{M}oving-\underline{M}obile-\underline{Net}work, named M$^2$Net, for landmark recognition, equipped each landmark image with located geographic information. We intuitively find that M$^2$Net can essentially promote the diversity of the inference path (selected blocks subset) selection, so as to enhance the recognition accuracy. The above intuition is achieved by our proposed reward function with the input of geo-location and landmarks. We also find that the performance of other portable networks can be improved via our architecture. We construct two landmark image datasets, with each landmark associated with geographic information, over which we conduct extensive experiments to demonstrate that M$^2$Net achieves improved recognition accuracy with comparable complexity.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
Orchestrating the Development Lifecycle of Machine Learning-Based IoT Applications: A Taxonomy and Survey
Authors:
Bin Qian,
Jie Su,
Zhenyu Wen,
Devki Nandan Jha,
Yinhao Li,
Yu Guan,
Deepak Puthal,
Philip James,
Renyu Yang,
Albert Y. Zomaya,
Omer Rana,
Lizhe Wang,
Maciej Koutny,
Rajiv Ranjan
Abstract:
Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML techniques unlock complete potentials of IoT with intelligence, and IoT applications increasingly feed data collected by sensors into ML models, thereby employing results to improve their business processes and services. Hence, orchestrating ML pipelines that encompasses model training and implication involved in hol…
▽ More
Machine Learning (ML) and Internet of Things (IoT) are complementary advances: ML techniques unlock complete potentials of IoT with intelligence, and IoT applications increasingly feed data collected by sensors into ML models, thereby employing results to improve their business processes and services. Hence, orchestrating ML pipelines that encompasses model training and implication involved in holistic development lifecycle of an IoT application often leads to complex system integration. This paper provides a comprehensive and systematic survey on the development lifecycle of ML-based IoT application. We outline core roadmap and taxonomy, and subsequently assess and compare existing standard techniques used in individual stage.
△ Less
Submitted 29 May, 2020; v1 submitted 11 October, 2019;
originally announced October 2019.
-
A Targeted Acceleration and Compression Framework for Low bit Neural Networks
Authors:
Biao Qian,
Yang Wang
Abstract:
1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement . However, the drawback of large accuracy drop** also restrict s its application. In this paper, we propose a novel Targeted Acceleration and Compression (TAC) framework to improve the performance…
▽ More
1 bit deep neural networks (DNNs), of which both the activations and weights are binarized , are attracting more and more attention due to their high computational efficiency and low memory requirement . However, the drawback of large accuracy drop** also restrict s its application. In this paper, we propose a novel Targeted Acceleration and Compression (TAC) framework to improve the performance of 1 bit deep neural networks W e consider that the acceleration and compression effects of binarizing fully connected layer s are not sufficient to compensate for the accuracy loss caused by it In the proposed framework, t he convolutional and fully connected layer are separated and optimized i ndividually . F or the convolutional layer s , both the activations and weights are binarized. For the fully connected layer s, the binarization operation is re placed by network pruning and low bit quantization. The proposed framework is implemented on the CIFAR 10, CIFAR 100 and ImageNet ( ILSVRC 12 ) datasets , and experimental results show that the proposed TAC can significantly improve the accuracy of 1 bit deep neural networks and outperforms the state of the art by more than 6 percentage points .
△ Less
Submitted 9 July, 2019;
originally announced July 2019.
-
Modeling e-Learners' Cognitive and Metacognitive Strategy in Comparative Question Solving
Authors:
Feng Tian,
Jia Yue,
Kuo-ming Chao,
Buyue Qian,
Nazaraf Shah,
Longzhuang Li,
Hai** Zhu,
Yan Chen,
Bin Zeng,
Qinghua Zheng
Abstract:
Cognitive and metacognitive strategy had demonstrated a significant role in self-regulated learning (SRL), and an appropriate use of strategies is beneficial to effective learning or question-solving tasks during a human-computer interaction process. This paper proposes a novel method combining Knowledge Map (KM) based data mining technique with Thinking Map (TM) to detect learner's cognitive and…
▽ More
Cognitive and metacognitive strategy had demonstrated a significant role in self-regulated learning (SRL), and an appropriate use of strategies is beneficial to effective learning or question-solving tasks during a human-computer interaction process. This paper proposes a novel method combining Knowledge Map (KM) based data mining technique with Thinking Map (TM) to detect learner's cognitive and metacognitive strategy in the question-solving scenario. In particular, a graph-based mining algorithm is designed to facilitate our proposed method, which can automatically map cognitive strategy to metacognitive strategy with raising abstraction level, and make the cognitive and metacognitive process viewable, which acts like a reverse engineering engine to explain how a learner thinks when solving a question. Additionally, we develop an online learning environment system for participants to learn and record their behaviors. To corroborate the effectiveness of our approach and algorithm, we conduct experiments recruiting 173 postgraduate and undergraduate students, and they were asked to complete a question-solving task, such as "What are similarities and differences between array and pointer?" from "The C Programming Language" course and "What are similarities and differences between packet switching and circuit switching?" from "Computer Network Principle" course. The mined strategies patterns results are encouraging and supported well our proposed method.
△ Less
Submitted 4 June, 2019;
originally announced June 2019.
-
Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding
Authors:
Zihao Zhu,
Changchang Yin,
Buyue Qian,
Yu Cheng,
Jishang Wei,
Fei Wang
Abstract:
Evaluating the clinical similarities between pairwise patients is a fundamental problem in healthcare informatics. A proper patient similarity measure enables various downstream applications, such as cohort study and treatment comparative effectiveness research. One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitu…
▽ More
Evaluating the clinical similarities between pairwise patients is a fundamental problem in healthcare informatics. A proper patient similarity measure enables various downstream applications, such as cohort study and treatment comparative effectiveness research. One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, and sparse. Though existing studies on learning patient similarity from EHRs have shown being useful in solving real clinical problems, their applicability is limited due to the lack of medical interpretations. Moreover, most previous methods assume a vector-based representation for patients, which typically requires aggregation of medical events over a certain time period. As a consequence, temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based on the temporal matching of longitudinal patient EHRs. Two efficient methods are presented, unsupervised and supervised, both of which preserve the temporal properties in EHRs. The supervised scheme takes a convolutional neural network architecture and learns an optimal representation of patient clinical records with medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement over the baselines. We make our code and sample data available for further study.
△ Less
Submitted 9 February, 2019;
originally announced February 2019.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
A Reconstruction Error Formulation for Semi-Supervised Multi-task and Multi-view Learning
Authors:
Buyue Qian,
Xiang Wang,
Ian Davidson
Abstract:
A significant challenge to make learning techniques more suitable for general purpose use is to move beyond i) complete supervision, ii) low dimensional data, iii) a single task and single view per instance. Solving these challenges allows working with "Big Data" problems that are typically high dimensional with multiple (but possibly incomplete) labelings and views. While other work has addressed…
▽ More
A significant challenge to make learning techniques more suitable for general purpose use is to move beyond i) complete supervision, ii) low dimensional data, iii) a single task and single view per instance. Solving these challenges allows working with "Big Data" problems that are typically high dimensional with multiple (but possibly incomplete) labelings and views. While other work has addressed each of these problems separately, in this paper we show how to address them together, namely semi-supervised dimension reduction for multi-task and multi-view learning (SSDR-MML), which performs optimization for dimension reduction and label inference in semi-supervised setting. The proposed framework is designed to handle both multi-task and multi-view learning settings, and can be easily adapted to many useful applications. Information obtained from all tasks and views is combined via reconstruction errors in a linear fashion that can be efficiently solved using an alternating optimization scheme. Our formulation has a number of advantages. We explicitly model the information combining mechanism as a data structure (a weight/nearest-neighbor matrix) which allows investigating fundamental questions in multi-task and multi-view learning. We address one such question by presenting a general measure to quantify the success of simultaneous learning of multiple tasks or from multiple views. We show that our SSDR-MML approach can outperform many state-of-the-art baseline methods and demonstrate the effectiveness of connecting dimension reduction and learning.
△ Less
Submitted 3 February, 2012;
originally announced February 2012.
-
On Constrained Spectral Clustering and Its Applications
Authors:
Xiang Wang,
Buyue Qian,
Ian Davidson
Abstract:
Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a develo** area. In this paper, we propose a flexible framework for constrained spectral clusterin…
▽ More
Constrained clustering has been well-studied for algorithms such as $K$-means and hierarchical clustering. However, how to satisfy many constraints in these algorithmic settings has been shown to be intractable. One alternative to encode many constraints is to use spectral clustering, which remains a develo** area. In this paper, we propose a flexible framework for constrained spectral clustering. In contrast to some previous efforts that implicitly encode Must-Link and Cannot-Link constraints by modifying the graph Laplacian or constraining the underlying eigenspace, we present a more natural and principled formulation, which explicitly encodes the constraints as part of a constrained optimization problem. Our method offers several practical advantages: it can encode the degree of belief in Must-Link and Cannot-Link constraints; it guarantees to lower-bound how well the given constraints are satisfied using a user-specified threshold; it can be solved deterministically in polynomial time through generalized eigendecomposition. Furthermore, by inheriting the objective function from spectral clustering and encoding the constraints explicitly, much of the existing analysis of unconstrained spectral clustering techniques remains valid for our formulation. We validate the effectiveness of our approach by empirical results on both artificial and real datasets. We also demonstrate an innovative use of encoding large number of constraints: transfer learning via constraints.
△ Less
Submitted 21 September, 2012; v1 submitted 25 January, 2012;
originally announced January 2012.